CV overview and digital image basics: RGB/HSV color space, pixel matrix and bit depth

Introduction

Computer Vision (CV) is one of the three major perception branches of artificial intelligence. Its core task is to allow computers to "understand" the world - to extract structured information from images and videos to complete cognitive level understanding.

But before chewing on the "hard bones" of target detection, image segmentation, and Transformer large models, it is 100% necessary to understand these three things: "What exactly is an image?" "How does the computer store the image?" "How to explain the color clearly to the computer." This article uses vernacular + OpenCV code to explain these three core underlying knowledge at once.

📂 Phase: Phase 1 — Cornerstone of Image Processing (Traditional CV) 🔗 Related Chapters: OpenCV 快速入门 · 图像增强与滤波


1. What is computer vision?

1.1 Three processing levels of CV

According to the degree of abstraction and processing complexity, CV tasks are usually divided into three levels, progressively:

HierarchyInput → OutputCore functionsRepresentative technology
Low-level visionImage → Image / low-dimensional signalPixel-level direct operation without losing spatial structureFiltering, denoising, edge/corner detection
Middle-level visionImage → structured/semi-structured informationConvert pixels into "meaningful object parts"Feature matching, target detection, image segmentation
High-level visionImage → Semantics/Cognitive resultsGiving images "human-understandable meaning"Scene understanding, behavior recognition, image description

In other words, low-level vision processes the "raw materials" of the image, middle-level vision begins to extract "parts", and high-level vision is responsible for outputting an understanding of the entire image.

1.2 High-frequency application scenarios (CV around you)

FieldsTypical applications
Mobile InternetBeauty filters, scan code payment, face unlocking
Autonomous drivingLane line detection, pedestrian detection, traffic sign recognition
Industrial quality inspectionDefect detection, parts counting, dimensional measurement
Medical and healthTumor image segmentation, fundus lesion screening

2. The essence of digital images: pixel matrix

2.1 From "Photo" to "Matrix"

Real-world photos are continuous light signals, but computers can only process discrete numbers. A digital camera or scanner turns light into a matrix in a two-step process:

  1. Sampling: Divide the photo into H (height) × W (width) small grids - each grid is called a pixel (Pixel)
  2. Quantization: Assign a numerical value (usually an integer within a fixed range) to the brightness or color of each pixel.

In this way, an image becomes a digital matrix in the eyes of the computer.

Grayscale images vs color images

Image typeStorage matrix dimensionsMeaning of each elementCommon value range (8 bits)
GrayscaleH × WGrayscale value of pixel0 (black) ~ 255 (white)
Color RGBH × W × 3R / G / B brightness of three channelsEach channel 0 ~ 255

For ease of understanding, we use NumPy to manually create some solid color images:

import numpy as np
import cv2

# 1. 创建100×100的纯黑灰度图(全0)
black_gray = np.zeros((100, 100), dtype=np.uint8)
# 创建100×100的纯白灰度图(全255)
white_gray = np.full((100, 100), 255, dtype=np.uint8)

# 2. 创建100×100×3的纯黑RGB图(注意:OpenCV 默认 BGR,这里先按概念说 RGB)
black_rgb = np.zeros((100, 100, 3), dtype=np.uint8)
# 创建100×100×3的红色图(后面会专门说明 OpenCV 的 BGR 顺序!)
red_rgb = np.zeros((100, 100, 3), dtype=np.uint8)
red_rgb[:, :, 0] = 255  # 假设最后一维索引0是R(但 OpenCV 里其实是 B)

print(f"灰度图维度: {black_gray.shape}")
print(f"彩色图维度: {black_rgb.shape}")

As you can see, the grayscale image is a simple two-dimensional matrix, while the color image is a "three-dimensional" array - width, height and three color channels.

2.2 Bit depth: determines the "fineness" of the image

Bit depth (Bit Depth) refers to how many binary numbers are used to store each pixel channel. The more bits there are, the richer the grayscale or color levels that can be represented:

  • 8 bits: Most commonly used! The value range of each channel is 0 ~ 255, and the combination of approximately 16.78 million RGB colors
  • 16 bits: dedicated to medical imaging and astronomical images, each channel ranges from 0 to 65535
  • 32-bit floating point number: commonly used in deep learning and high-precision processing, the brightness is usually normalized to between 0.0 and 1.0

Directly generate random maps of three bit depths and observe their types and ranges:

import numpy as np

# 生成3种位深度的随机图
img_8bit = np.random.randint(0, 256, (50, 50), dtype=np.uint8)
img_16bit = np.random.randint(0, 65536, (50, 50), dtype=np.uint16)
img_32float = np.random.rand(50, 50).astype(np.float32)

print(f"8位图的数据类型: {img_8bit.dtype}, 最小值: {img_8bit.min()}, 最大值: {img_8bit.max()}")
print(f"16位图的数据类型: {img_16bit.dtype}, 最小值: {img_16bit.min()}, 最大值: {img_16bit.max()}")
print(f"32位浮点图的数据类型: {img_32float.dtype}, 最小值: {img_32float.min():.4f}, 最大值: {img_32float.max():.4f}")

3. Two core color spaces

3.1 RGB: The most “comfortable” format for computers

RGB (Red, Green, Blue) is based on the additive color mixing of light - three beams of red, green, and blue light are superimposed in different proportions to form the vast majority of colors that the human eye can perceive. This is the native storage format of monitors and image files.

Common RGB / BGR values ​​(OpenCV uses BGR!)

ColorStandard RGBOpenCV BGR
Red(255, 0, 0)(0, 0, 255)
Green(0, 255, 0)(0, 255, 0)
Blue(0, 0, 255)(255, 0, 0)
White(255, 255, 255)Same as above
Gray (middle)(128, 128, 128)Same as above

Key pitfalls of OpenCV color space conversion

For historical reasons, OpenCV's color image channel order is BGR instead of RGB. This is very easy to get into trouble, especially when you use OpenCV with libraries such as Matplotlib and Pillow. Be sure to do the conversion first.

import cv2
import numpy as np

# 假设读入一张真实图片
img_bgr = cv2.imread("test.jpg")

# 核心转换函数
img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)   # 用于 matplotlib 显示
img_gray = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2GRAY) # 转灰度
img_hsv = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2HSV)   # 转 HSV(下一节重点)

# 简单验证:手动切片反转通道
img_rgb_manual = img_bgr[:, :, ::-1]  # 第三个维度(通道)倒序
print(np.array_equal(img_rgb, img_rgb_manual))  # 应该输出 True

Memory Tip: If you directly use Matplotlib to displayimg_bgr, the red and blue will interchange and the image will look weird. Remember: BGR to RGB only needs the last dimension in reverse order.

3.2 HSV: the most “intuitive” format for humans

HSV (Hue, Saturation, Value) separates color attributes and is closer to the way we describe colors every day - such as "this red is brighter" and "that blue is darker". Therefore, HSV is usually the first choice when doing color segmentation and object tracking**.

The meaning of the three components (OpenCV exclusive value range!)

ComponentMeaningOpenCV Range
Hue (hue)Basic colors (red → yellow → green → cyan → blue → purple → red)0 ~ 179 (compressed 360° standard color wheel)
SaturationThe "purity/vividness" of the color, 0 is gray, 255 is solid color0 ~ 255
Value (brightness)The "brightness" of the color, 0 is black, 255 is the brightest0 ~ 255

Note: The range of Hue is 0 ~ 179 instead of 0 ~ 360. This is because OpenCV compresses the 360° hue wheel to 0 ~ 180 in order to fit to the 8-bit integer range (0 ~ 255).

Practical combat: using HSV for red object detection

Red is in the range of 350° to 10° on the color wheel. When mapped to OpenCV, it becomes two intervals: 0 to 10 and 170 to 180. Therefore, detecting red requires processing these two segments separately and then merging the masks.

Below is a complete red detection function, you can try it directly on your own pictures:

import cv2
import numpy as np

def detect_red(image_path):
    # 1. 读入并转 HSV
    img_bgr = cv2.imread(image_path)
    if img_bgr is None:
        print("请输入正确的图片路径!")
        return None, None, None
    img_hsv = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2HSV)

    # 2. 定义红色的 HSV 两个区间
    lower1 = np.array([0, 100, 100])
    upper1 = np.array([10, 255, 255])
    lower2 = np.array([170, 100, 100])
    upper2 = np.array([180, 255, 255])

    # 3. 生成掩码(mask):红色区域为白色(255),其他为黑色(0)
    mask1 = cv2.inRange(img_hsv, lower1, upper1)
    mask2 = cv2.inRange(img_hsv, lower2, upper2)
    mask = cv2.bitwise_or(mask1, mask2)  # 合并两个区间

    # 4. 用掩码提取红色区域
    result = cv2.bitwise_and(img_bgr, img_bgr, mask=mask)

    return img_bgr, mask, result

# 本地运行时取消注释即可
# original, mask, res = detect_red("red_flower.jpg")
# cv2.imshow("原图", original)
# cv2.imshow("红色掩码", mask)
# cv2.imshow("红色检测结果", res)
# cv2.waitKey(0)
# cv2.destroyAllWindows()

AdjustmentlowerandupperThe saturation (S) and brightness (V) thresholds in can control the tolerance of light/dark colors, which is the main focus when adjusting your own parameters.


4. Practical project: quickly analyze the information of an image

Integrate the knowledge points learned previously into a practical function. As long as you enter the image path, you can print out all the core attributes of the image at once:

import cv2

def analyze_image(image_path):
    img = cv2.imread(image_path)
    if img is None:
        print("❌ 图片读取失败,请检查路径!")
        return

    # 1. 基础属性
    h, w = img.shape[:2]
    channels = img.shape[2] if len(img.shape) == 3 else 1
    dtype = img.dtype

    print(f"✅ 图像基本信息:")
    print(f"   - 尺寸: {w} × {h}")
    print(f"   - 通道数: {channels}")
    print(f"   - 数据类型: {dtype}")
    print(f"   - 总像素数: {w * h:,}")  # 加逗号分隔大数字

    # 2. BGR 各通道的统计(仅彩色图)
    if channels == 3:
        print(f"\n✅ BGR 通道统计:")
        for idx, name in enumerate(['Blue', 'Green', 'Red']):
            ch = img[:, :, idx]
            print(f"   - {name}: min={ch.min()}, max={ch.max()}, mean={ch.mean():.2f}")

# 本地运行示例
# analyze_image("test.jpg")

This function can help you quickly understand the data structure of an image, and is also very suitable for debugging subsequent image processing algorithms.


5. Summary

Review of core knowledge points

  1. Image Nature: After sampling and quantization, the continuous light signal becomes a discrete pixel matrix (H×W or H×W×3)
  2. Bit depth: Commonly used is 8 bits (0 ~ 255). The more bits, the richer the image details.
  3. Color Space:
  • RGB/BGR: Computer native format, hardware friendly, but with high color coupling
  • HSV: In line with human intuition and color separation, it is the first choice for color detection and segmentation
  1. OpenCV Pitfall: Default BGR channel sequence, must be passed when working with other librariescv2.cvtColoror slice reverse order conversion

💡 Final reminder: The pixel matrix is ​​the cornerstone of computer vision! All advanced algorithms - including convolutional neural networks (CNN) - are essentially "playing with matrices": addition, subtraction, multiplication, division, convolution, pooling... It is recommended to manually generate small matrices, modify pixel values, and establish an intuitive "matrix sense" as early as possible!