CV overview and digital image basics: RGB/HSV color space, pixel matrix and bit depth
Introduction
Computer Vision (CV) is one of the three major perception branches of artificial intelligence. Its core task is to allow computers to "understand" the world - to extract structured information from images and videos to complete cognitive level understanding.
But before chewing on the "hard bones" of target detection, image segmentation, and Transformer large models, it is 100% necessary to understand these three things: "What exactly is an image?" "How does the computer store the image?" "How to explain the color clearly to the computer." This article uses vernacular + OpenCV code to explain these three core underlying knowledge at once.
📂 Phase: Phase 1 — Cornerstone of Image Processing (Traditional CV) 🔗 Related Chapters: OpenCV 快速入门 · 图像增强与滤波
1. What is computer vision?
1.1 Three processing levels of CV
According to the degree of abstraction and processing complexity, CV tasks are usually divided into three levels, progressively:
In other words, low-level vision processes the "raw materials" of the image, middle-level vision begins to extract "parts", and high-level vision is responsible for outputting an understanding of the entire image.
1.2 High-frequency application scenarios (CV around you)
2. The essence of digital images: pixel matrix
2.1 From "Photo" to "Matrix"
Real-world photos are continuous light signals, but computers can only process discrete numbers. A digital camera or scanner turns light into a matrix in a two-step process:
- Sampling: Divide the photo into H (height) × W (width) small grids - each grid is called a pixel (Pixel)
- Quantization: Assign a numerical value (usually an integer within a fixed range) to the brightness or color of each pixel.
In this way, an image becomes a digital matrix in the eyes of the computer.
Grayscale images vs color images
For ease of understanding, we use NumPy to manually create some solid color images:
As you can see, the grayscale image is a simple two-dimensional matrix, while the color image is a "three-dimensional" array - width, height and three color channels.
2.2 Bit depth: determines the "fineness" of the image
Bit depth (Bit Depth) refers to how many binary numbers are used to store each pixel channel. The more bits there are, the richer the grayscale or color levels that can be represented:
- 8 bits: Most commonly used! The value range of each channel is 0 ~ 255, and the combination of approximately 16.78 million RGB colors
- 16 bits: dedicated to medical imaging and astronomical images, each channel ranges from 0 to 65535
- 32-bit floating point number: commonly used in deep learning and high-precision processing, the brightness is usually normalized to between 0.0 and 1.0
Directly generate random maps of three bit depths and observe their types and ranges:
3. Two core color spaces
3.1 RGB: The most “comfortable” format for computers
RGB (Red, Green, Blue) is based on the additive color mixing of light - three beams of red, green, and blue light are superimposed in different proportions to form the vast majority of colors that the human eye can perceive. This is the native storage format of monitors and image files.
Common RGB / BGR values (OpenCV uses BGR!)
Key pitfalls of OpenCV color space conversion
For historical reasons, OpenCV's color image channel order is BGR instead of RGB. This is very easy to get into trouble, especially when you use OpenCV with libraries such as Matplotlib and Pillow. Be sure to do the conversion first.
Memory Tip: If you directly use Matplotlib to display
img_bgr, the red and blue will interchange and the image will look weird. Remember: BGR to RGB only needs the last dimension in reverse order.
3.2 HSV: the most “intuitive” format for humans
HSV (Hue, Saturation, Value) separates color attributes and is closer to the way we describe colors every day - such as "this red is brighter" and "that blue is darker". Therefore, HSV is usually the first choice when doing color segmentation and object tracking**.
The meaning of the three components (OpenCV exclusive value range!)
Note: The range of Hue is 0 ~ 179 instead of 0 ~ 360. This is because OpenCV compresses the 360° hue wheel to 0 ~ 180 in order to fit to the 8-bit integer range (0 ~ 255).
Practical combat: using HSV for red object detection
Red is in the range of 350° to 10° on the color wheel. When mapped to OpenCV, it becomes two intervals: 0 to 10 and 170 to 180. Therefore, detecting red requires processing these two segments separately and then merging the masks.
Below is a complete red detection function, you can try it directly on your own pictures:
AdjustmentlowerandupperThe saturation (S) and brightness (V) thresholds in can control the tolerance of light/dark colors, which is the main focus when adjusting your own parameters.
4. Practical project: quickly analyze the information of an image
Integrate the knowledge points learned previously into a practical function. As long as you enter the image path, you can print out all the core attributes of the image at once:
This function can help you quickly understand the data structure of an image, and is also very suitable for debugging subsequent image processing algorithms.
5. Summary
Review of core knowledge points
- Image Nature: After sampling and quantization, the continuous light signal becomes a discrete pixel matrix (H×W or H×W×3)
- Bit depth: Commonly used is 8 bits (0 ~ 255). The more bits, the richer the image details.
- Color Space:
- RGB/BGR: Computer native format, hardware friendly, but with high color coupling
- HSV: In line with human intuition and color separation, it is the first choice for color detection and segmentation
- OpenCV Pitfall: Default BGR channel sequence, must be passed when working with other libraries
cv2.cvtColoror slice reverse order conversion
💡 Final reminder: The pixel matrix is the cornerstone of computer vision! All advanced algorithms - including convolutional neural networks (CNN) - are essentially "playing with matrices": addition, subtraction, multiplication, division, convolution, pooling... It is recommended to manually generate small matrices, modify pixel values, and establish an intuitive "matrix sense" as early as possible!

