CV overview and digital image basics: RGB/HSV color space, pixel matrix and bit depth

Introduction

Computer Vision (CV) is one of the three major perception branches of artificial intelligence. Its core task is to allow computers to "understand" the world - to extract structured information from images and videos to complete cognitive level understanding.

But before chewing on the "hard bones" of target detection, image segmentation, and Transformer large models, it is 100% necessary to understand these three things: "What exactly is an image?" "How does the computer store the image?" "How to explain the color clearly to the computer." This article uses vernacular + OpenCV code to explain these three core underlying knowledge at once.

📂 Phase: Phase 1 — Cornerstone of Image Processing (Traditional CV) 🔗 Related Chapters: OpenCV 快速入门 · 图像增强与滤波

1. What is computer vision?

1.1 Three processing levels of CV

According to the degree of abstraction and processing complexity, CV tasks are usually divided into three levels, progressively:

Hierarchy	Input → Output	Core functions	Representative technology
Low-level vision	Image → Image / low-dimensional signal	Pixel-level direct operation without losing spatial structure	Filtering, denoising, edge/corner detection
Middle-level vision	Image → structured/semi-structured information	Convert pixels into "meaningful object parts"	Feature matching, target detection, image segmentation
High-level vision	Image → Semantics/Cognitive results	Giving images "human-understandable meaning"	Scene understanding, behavior recognition, image description

In other words, low-level vision processes the "raw materials" of the image, middle-level vision begins to extract "parts", and high-level vision is responsible for outputting an understanding of the entire image.

1.2 High-frequency application scenarios (CV around you)

Fields	Typical applications
Mobile Internet	Beauty filters, scan code payment, face unlocking
Autonomous driving	Lane line detection, pedestrian detection, traffic sign recognition
Industrial quality inspection	Defect detection, parts counting, dimensional measurement
Medical and health	Tumor image segmentation, fundus lesion screening

2. The essence of digital images: pixel matrix

2.1 From "Photo" to "Matrix"

Real-world photos are continuous light signals, but computers can only process discrete numbers. A digital camera or scanner turns light into a matrix in a two-step process:

Sampling: Divide the photo into H (height) × W (width) small grids - each grid is called a pixel (Pixel)
Quantization: Assign a numerical value (usually an integer within a fixed range) to the brightness or color of each pixel.

In this way, an image becomes a digital matrix in the eyes of the computer.

Grayscale images vs color images

Image type	Storage matrix dimensions	Meaning of each element	Common value range (8 bits)
Grayscale	H × W	Grayscale value of pixel	0 (black) ~ 255 (white)
Color RGB	H × W × 3	R / G / B brightness of three channels	Each channel 0 ~ 255

For ease of understanding, we use NumPy to manually create some solid color images:

import numpy as np
import cv2

# 1. 创建100×100的纯黑灰度图（全0）
black_gray = np.zeros((100, 100), dtype=np.uint8)
# 创建100×100的纯白灰度图（全255）
white_gray = np.full((100, 100), 255, dtype=np.uint8)

# 2. 创建100×100×3的纯黑RGB图（注意：OpenCV 默认 BGR，这里先按概念说 RGB）
black_rgb = np.zeros((100, 100, 3), dtype=np.uint8)
# 创建100×100×3的红色图（后面会专门说明 OpenCV 的 BGR 顺序！）
red_rgb = np.zeros((100, 100, 3), dtype=np.uint8)
red_rgb[:, :, 0] = 255  # 假设最后一维索引0是R（但 OpenCV 里其实是 B）

print(f"灰度图维度: {black_gray.shape}")
print(f"彩色图维度: {black_rgb.shape}")

As you can see, the grayscale image is a simple two-dimensional matrix, while the color image is a "three-dimensional" array - width, height and three color channels.

2.2 Bit depth: determines the "fineness" of the image

Bit depth (Bit Depth) refers to how many binary numbers are used to store each pixel channel. The more bits there are, the richer the grayscale or color levels that can be represented:

8 bits: Most commonly used! The value range of each channel is 0 ~ 255, and the combination of approximately 16.78 million RGB colors
16 bits: dedicated to medical imaging and astronomical images, each channel ranges from 0 to 65535
32-bit floating point number: commonly used in deep learning and high-precision processing, the brightness is usually normalized to between 0.0 and 1.0

Directly generate random maps of three bit depths and observe their types and ranges:

import numpy as np

# 生成3种位深度的随机图
img_8bit = np.random.randint(0, 256, (50, 50), dtype=np.uint8)
img_16bit = np.random.randint(0, 65536, (50, 50), dtype=np.uint16)
img_32float = np.random.rand(50, 50).astype(np.float32)

print(f"8位图的数据类型: {img_8bit.dtype}, 最小值: {img_8bit.min()}, 最大值: {img_8bit.max()}")
print(f"16位图的数据类型: {img_16bit.dtype}, 最小值: {img_16bit.min()}, 最大值: {img_16bit.max()}")
print(f"32位浮点图的数据类型: {img_32float.dtype}, 最小值: {img_32float.min():.4f}, 最大值: {img_32float.max():.4f}")

3. Two core color spaces

3.1 RGB: The most “comfortable” format for computers

RGB (Red, Green, Blue) is based on the additive color mixing of light - three beams of red, green, and blue light are superimposed in different proportions to form the vast majority of colors that the human eye can perceive. This is the native storage format of monitors and image files.

Common RGB / BGR values (OpenCV uses BGR!)

Color	Standard RGB	OpenCV BGR
Red	(255, 0, 0)	(0, 0, 255)
Green	(0, 255, 0)	(0, 255, 0)
Blue	(0, 0, 255)	(255, 0, 0)
White	(255, 255, 255)	Same as above
Gray (middle)	(128, 128, 128)	Same as above

Key pitfalls of OpenCV color space conversion

For historical reasons, OpenCV's color image channel order is BGR instead of RGB. This is very easy to get into trouble, especially when you use OpenCV with libraries such as Matplotlib and Pillow. Be sure to do the conversion first.

import cv2
import numpy as np

# 假设读入一张真实图片
img_bgr = cv2.imread("test.jpg")

# 核心转换函数
img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)   # 用于 matplotlib 显示
img_gray = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2GRAY) # 转灰度
img_hsv = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2HSV)   # 转 HSV（下一节重点）

# 简单验证：手动切片反转通道
img_rgb_manual = img_bgr[:, :, ::-1]  # 第三个维度（通道）倒序
print(np.array_equal(img_rgb, img_rgb_manual))  # 应该输出 True

Memory Tip: If you directly use Matplotlib to displayimg_bgr, the red and blue will interchange and the image will look weird. Remember: BGR to RGB only needs the last dimension in reverse order.

3.2 HSV: the most “intuitive” format for humans

HSV (Hue, Saturation, Value) separates color attributes and is closer to the way we describe colors every day - such as "this red is brighter" and "that blue is darker". Therefore, HSV is usually the first choice when doing color segmentation and object tracking**.

The meaning of the three components (OpenCV exclusive value range!)

Component	Meaning	OpenCV Range
Hue (hue)	Basic colors (red → yellow → green → cyan → blue → purple → red)	0 ~ 179 (compressed 360° standard color wheel)
Saturation	The "purity/vividness" of the color, 0 is gray, 255 is solid color	0 ~ 255
Value (brightness)	The "brightness" of the color, 0 is black, 255 is the brightest	0 ~ 255

Note: The range of Hue is 0 ~ 179 instead of 0 ~ 360. This is because OpenCV compresses the 360° hue wheel to 0 ~ 180 in order to fit to the 8-bit integer range (0 ~ 255).

Practical combat: using HSV for red object detection

Red is in the range of 350° to 10° on the color wheel. When mapped to OpenCV, it becomes two intervals: 0 to 10 and 170 to 180. Therefore, detecting red requires processing these two segments separately and then merging the masks.

Below is a complete red detection function, you can try it directly on your own pictures:

import cv2
import numpy as np

def detect_red(image_path):
    # 1. 读入并转 HSV
    img_bgr = cv2.imread(image_path)
    if img_bgr is None:
        print("请输入正确的图片路径！")
        return None, None, None
    img_hsv = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2HSV)

    # 2. 定义红色的 HSV 两个区间
    lower1 = np.array([0, 100, 100])
    upper1 = np.array([10, 255, 255])
    lower2 = np.array([170, 100, 100])
    upper2 = np.array([180, 255, 255])

    # 3. 生成掩码（mask）：红色区域为白色(255)，其他为黑色(0)
    mask1 = cv2.inRange(img_hsv, lower1, upper1)
    mask2 = cv2.inRange(img_hsv, lower2, upper2)
    mask = cv2.bitwise_or(mask1, mask2)  # 合并两个区间

    # 4. 用掩码提取红色区域
    result = cv2.bitwise_and(img_bgr, img_bgr, mask=mask)

    return img_bgr, mask, result

# 本地运行时取消注释即可
# original, mask, res = detect_red("red_flower.jpg")
# cv2.imshow("原图", original)
# cv2.imshow("红色掩码", mask)
# cv2.imshow("红色检测结果", res)
# cv2.waitKey(0)
# cv2.destroyAllWindows()

AdjustmentlowerandupperThe saturation (S) and brightness (V) thresholds in can control the tolerance of light/dark colors, which is the main focus when adjusting your own parameters.

4. Practical project: quickly analyze the information of an image

Integrate the knowledge points learned previously into a practical function. As long as you enter the image path, you can print out all the core attributes of the image at once:

import cv2

def analyze_image(image_path):
    img = cv2.imread(image_path)
    if img is None:
        print("❌ 图片读取失败，请检查路径！")
        return

    # 1. 基础属性
    h, w = img.shape[:2]
    channels = img.shape[2] if len(img.shape) == 3 else 1
    dtype = img.dtype

    print(f"✅ 图像基本信息：")
    print(f"   - 尺寸: {w} × {h}")
    print(f"   - 通道数: {channels}")
    print(f"   - 数据类型: {dtype}")
    print(f"   - 总像素数: {w * h:,}")  # 加逗号分隔大数字

    # 2. BGR 各通道的统计（仅彩色图）
    if channels == 3:
        print(f"\n✅ BGR 通道统计：")
        for idx, name in enumerate(['Blue', 'Green', 'Red']):
            ch = img[:, :, idx]
            print(f"   - {name}: min={ch.min()}, max={ch.max()}, mean={ch.mean():.2f}")

# 本地运行示例
# analyze_image("test.jpg")

This function can help you quickly understand the data structure of an image, and is also very suitable for debugging subsequent image processing algorithms.

5. Summary

Review of core knowledge points

Image Nature: After sampling and quantization, the continuous light signal becomes a discrete pixel matrix (H×W or H×W×3)
Bit depth: Commonly used is 8 bits (0 ~ 255). The more bits, the richer the image details.
Color Space:

RGB/BGR: Computer native format, hardware friendly, but with high color coupling
HSV: In line with human intuition and color separation, it is the first choice for color detection and segmentation

OpenCV Pitfall: Default BGR channel sequence, must be passed when working with other librariescv2.cvtColoror slice reverse order conversion

💡 Final reminder: The pixel matrix is the cornerstone of computer vision! All advanced algorithms - including convolutional neural networks (CNN) - are essentially "playing with matrices": addition, subtraction, multiplication, division, convolution, pooling... It is recommended to manually generate small matrices, modify pixel values, and establish an intuitive "matrix sense" as early as possible!

#CV overview and digital image basics: RGB/HSV color space, pixel matrix and bit depth

#Introduction

#1. What is computer vision?

#1.1 Three processing levels of CV

#1.2 High-frequency application scenarios (CV around you)

#2. The essence of digital images: pixel matrix

#2.1 From "Photo" to "Matrix"

#Grayscale images vs color images

#2.2 Bit depth: determines the "fineness" of the image

#3. Two core color spaces

#3.1 RGB: The most “comfortable” format for computers

#Common RGB / BGR values ​​(OpenCV uses BGR!)

#Key pitfalls of OpenCV color space conversion

#3.2 HSV: the most “intuitive” format for humans

#The meaning of the three components (OpenCV exclusive value range!)

#Practical combat: using HSV for red object detection

#4. Practical project: quickly analyze the information of an image

#5. Summary

#Review of core knowledge points

#Related tutorials