OpenCV Quick Start: A Complete Guide to Image Reading, Drawing and Geometric Transformation

Introduction

OpenCV (Open Source Computer Vision Library) is the de facto standard open source tool library in the field of computer vision. It not only provides full-link support from low-order pixel operations to high-order visual algorithms, but can also be called across platforms and languages ​​(C++/Python/Java, etc.) at extremely fast speeds (the bottom layer is optimized C/C++).

This article focuses on the first cornerstone of traditional CV: OpenCV basic IO, drawing, and geometric transformation. All codes are accompanied by clear comments and usage scenarios. After reading this, you can directly start writing practical scripts.

📂 Phase: Phase 1 — Cornerstone of Image Processing (Traditional CV) 🔗 Related Chapters: CV 概览与数字图像基础 · 图像增强与滤波


1. environment-setup: Get started with OpenCV in 5 minutes

1.1 Library selection and installation

OpenCV has 3 mainstream Python packages, just choose one according to your needs:

# ✅ 日常开发首选(包含常用功能)
pip install opencv-python

# ✅ 进阶/科研推荐(额外集成SIFT/SURF等非自由/实验算法)
pip install opencv-contrib-python

# ✅ 服务器环境专用(无GUI依赖,节省资源)
pip install opencv-python-headless

💡 Package Selection Suggestions: If you are not sure whether you will use feature algorithms (such as SIFT, SURF) in the future, you can install it directlyopencv-contrib-python, which is backwards compatible and contains a complete feature set.

1.2 One-click verification installation

Copy this code and run it. If there is no problem, it means the environment is set up:

import cv2
import numpy as np
import matplotlib.pyplot as plt

def check_opencv_env():
    """检查OpenCV核心功能是否正常"""
    # 1. 打印版本
    print(f"✅ OpenCV版本: {cv2.__version__}")
    
    # 2. 检查NumPy兼容性(OpenCV图像本质是NumPy数组)
    print(f"✅ NumPy版本: {np.__version__}")
    
    # 3. 尝试创建空白画布(基础绘图+内存分配验证)
    test_canvas = np.zeros((200, 200, 3), dtype=np.uint8)
    cv2.putText(test_canvas, "OpenCV Ready!", (20, 110), 
                cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 0), 2)
    print("✅ 画布创建成功")
    
    # 4. 检查Matplotlib字体(用于Jupyter显示中文)
    plt.rcParams['font.sans-serif'] = ['SimHei', 'Microsoft YaHei', 'Arial Unicode MS']
    plt.rcParams['axes.unicode_minus'] = False
    print("✅ Matplotlib中文配置完成")

check_opencv_env()

⚠️ FAQ: If an error is reportedImportError: libGL.so.1: cannot open shared object file, indicating that your environment lacks OpenGL dependencies and can be executed in the terminalsudo apt install libgl1-mesa-glx(Linux) or installopencv-python-headlessversion (headless mode).


2. Image IO: complete set of logic for reading → display → saving

💡 Core pre-knowledge points

Before opening any image, you must first remember three "counter-intuitive" settings of OpenCV:

  1. The default color image is BGR format (the reverse order of RGB in Matplotlib and PIL)
  2. The essence of image data is NumPy uint8 array, pixel value range0~255
  3. The order of image shapes is (高度, 宽度, 通道数), rather than the intuitive width × height

Once you understand these three points, you will avoid any pitfalls when dealing with color conversion, slicing, and drawing later.


2.1 Image reading: Handling loading failure scenarios

Many novices will get stuck here - the file path is written incorrectly or the image does not exist, and the code processes it directly without judgment, resulting in an error. A robust loading function is essential:

import cv2

# 三种常用读取模式
MODES = {
    "COLOR": cv2.IMREAD_COLOR,     # 彩色(默认,忽略alpha透明通道)
    "GRAY": cv2.IMREAD_GRAYSCALE,  # 灰度图(单通道)
    "UNCHANGED": cv2.IMREAD_UNCHANGED # 保持原样(含alpha通道,如PNG/GIF)
}

def load_image(image_path: str, mode: str = "COLOR"):
    """安全的图像加载函数,读取失败时直接报错并提示路径"""
    img = cv2.imread(image_path, MODES.get(mode.upper(), MODES["COLOR"]))
    if img is None:
        raise FileNotFoundError(f"❌ 图像不存在或格式不支持: {image_path}")
    print(f"✅ 图像加载成功 | 形状: {img.shape} | 数据类型: {img.dtype}")
    return img

# 使用示例
# img = load_image("test.jpg", "GRAY")

🔍 Debugging Tips: If your image path contains Chinese, OpenCVimreadMay not be read directly. You can use it first at this timenp.fromfile + cv2.imdecodeway to bypass the path encoding problem.

import numpy as np
def imread_chinese(path):
    stream = open(path, "rb").read()
    arr = np.frombuffer(stream, np.uint8)
    return cv2.imdecode(arr, cv2.IMREAD_COLOR)

2.2 Image display: two scene selection tools

Scenario 1: Local scripting/debugging (using OpenCV native window)

The native window supports keyboard and mouse events, suitable for quick viewing and interaction:

def show_opencv_window(img, window_name: str = "OpenCV Window", wait_ms: int = 0):
    """
    显示OpenCV原生窗口
    wait_ms: 等待毫秒数。0=无限等待,直到按下任意键
    """
    cv2.imshow(window_name, img)
    key = cv2.waitKey(wait_ms) & 0xFF  # 兼容64位系统
    cv2.destroyAllWindows()   # 退出后手动销毁窗口,避免内存泄漏
    return key  # 返回按键ASCII码,可用于交互

# 使用示例:按ESC键退出,按S键保存
# key = show_opencv_window(img)
# if key == 27:          # ESC键
#     print("退出")
# elif key == ord('s'):  # S键
#     cv2.imwrite("saved.jpg", img)

🖥️ Avoid window stuck:cv2.waitKey()must follow closelyimshow()Called afterwards, otherwise the window will become unresponsive. Remember to add in the loopcv2.destroyAllWindows()Clean up resources.

Scenario 2: Jupyter Notebook/Lab (using Matplotlib)

The native window pop-up experience in Notebook is extremely poor. It is recommended to use Matplotlib instead. But you must convert BGR to RGB, otherwise the colors will be messed up:

import matplotlib.pyplot as plt

def show_plt(img, title: str = "图像", figsize: tuple = (8, 6), cmap: str = None):
    """
    用Matplotlib显示OpenCV图像(自动处理BGR→RGB转换)
    cmap: 灰度图设为'gray',彩色图留空即可
    """
    plt.figure(figsize=figsize)
    # 处理通道顺序
    if len(img.shape) == 3:
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    else:
        cmap = 'gray' if cmap is None else cmap
    # 显示
    plt.imshow(img, cmap=cmap)
    plt.title(title, fontsize=14)
    plt.axis('off')       # 隐藏坐标轴
    plt.tight_layout()    # 自动调整边距
    plt.show()

🎨 Color order shorthand: OpenCV has the blue channel first (BGR), while Matplotlib expects the red channel first (RGB). Simple formula: "BGR is for OpenCV, RGB is for Matplotlib".


2.3 Image Saving: Control Compression Quality

When saving, the compression parameters are automatically adjusted according to the suffix name to achieve a balance between image quality and file size:

def save_image(img, output_path: str, jpg_quality: int = 95, png_compression: int = 3):
    """
    智能保存图像,自动根据后缀名调节参数
    jpg_quality: JPEG质量,0-100,越高越清晰
    png_compression: PNG压缩级别,0-9,越高文件越小但保存时间稍长
    """
    ext = output_path.split('.')[-1].lower()
    if ext in ['jpg', 'jpeg']:
        params = [cv2.IMWRITE_JPEG_QUALITY, jpg_quality]
    elif ext == 'png':
        params = [cv2.IMWRITE_PNG_COMPRESSION, png_compression]
    else:
        params = []
    
    success = cv2.imwrite(output_path, img, params)
    if success:
        print(f"✅ 图像保存成功: {output_path}")
    else:
        raise IOError(f"❌ 保存失败,请检查路径权限或格式: {output_path}")
    return success

💡 Compression parameter recommendations: JPEG quality is set to85-95to achieve a good balance between visual quality and file size; PNG compression level3-6Usually the best choice.


3. Basic drawing: annotate images and draw ROI (region of interest)

OpenCV's drawing function has highly unified parameters, and a common template can handle most scenarios:

cv2.绘图函数(图像, 位置参数, 颜色(BGR), 线宽, 线型, 偏移)

🔍 Key details:

  • Line width = -1 means filling the interior of the graphic
  • The color order is(蓝, 绿, 红), for example, pure red is(0, 0, 255)
  • All coordinates are(x, y), that is, (horizontal direction, vertical direction), which is opposite to the commonly understood table coordinates

3.1 Common graphics drawing codes

Below, common graphics are drawn on a white canvas with detailed notes:

import cv2
import numpy as np

# 创建400×600的白色画布(高400,宽600,3通道)
canvas = 255 * np.ones((400, 600, 3), dtype=np.uint8)

# ---------------------- 1. 直线 ----------------------
cv2.line(canvas, (50, 50), (550, 50), (0, 200, 0), 2)            # 绿色实线
cv2.line(canvas, (50, 100), (550, 300), (200, 0, 0), 3, cv2.LINE_AA) # 蓝色抗锯齿粗线

# ---------------------- 2. 矩形 ----------------------
cv2.rectangle(canvas, (80, 120), (220, 220), (0, 0, 200), 2)    # 空心红框,常用于标记目标
cv2.rectangle(canvas, (280, 120), (420, 220), (0, 200, 200), -1) # 填充青色矩形

# ---------------------- 3. 圆形 ----------------------
cv2.circle(canvas, (500, 170), 50, (200, 0, 200), 3)            # 空心紫色圆
cv2.circle(canvas, (150, 320), 30, (200, 200, 0), -1)           # 填充青色圆

# ---------------------- 4. 椭圆 ----------------------
# 参数:中心, (长轴长, 短轴长), 旋转角度, 起始角度, 结束角度, ...
cv2.ellipse(canvas, (350, 320), (60, 30), 30, 0, 360, (100, 100, 100), 2)   # 倾斜空心灰椭圆
cv2.ellipse(canvas, (350, 320), (60, 30), 30, 0, 180, (100, 0, 100), -1)    # 倾斜填充半椭圆

# ---------------------- 5. 多边形 ----------------------
# 注意:多边形顶点必须转换成形状为 (n,1,2) 的int32数组
pts = np.array([[450, 260], [500, 230], [550, 260], [530, 310], [470, 310]], np.int32)
pts = pts.reshape((-1, 1, 2))
cv2.polylines(canvas, [pts], True, (255, 100, 0), 2)  # 封闭五边形,True表示闭合
cv2.fillPoly(canvas, [pts], (255, 100, 0, 100))        # 半透明填充(需画布有alpha通道)

# ---------------------- 6. 文字 ----------------------
cv2.putText(canvas, "OpenCV Drawing Demo", (50, 370), 
            cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 0), 2, cv2.LINE_AA)

# 用之前封装的Matplotlib显示函数查看效果
show_plt(canvas, "绘图综合示例")

Practical Tips:

  • When labeling a target, generally draw a rectangular frame first, and then add a text label above the frame. The position of the text can be determined bycv2.getTextSize()Dynamically calculated to avoid exceeding the border.
  • When filling polygons, if you need a translucent effect, you need to create a layer with an alpha channel first, and then mix it. This is just for demonstration.

4. Geometric transformation: image scaling, rotation, translation, affine

The core idea of ​​geometric transformation is to use a transformation matrix to map image coordinates. OpenCV provides two main functions:

  • cv2.warpAffine(src, M, dsize): Process 2×3 affine transformation matrix (scaling, rotation, translation, shearing)
  • cv2.warpPerspective(src, M, dsize): Process the 3×3 perspective transformation matrix (leaved to subsequent chapters)

In this section we focus on the three most commonly used operations in daily life.


4.1 Image scaling: practical functions that maintain aspect ratio

directcv2.resizeAlthough simple, it can easily deform the image. The following wrapper function can scale while maintaining the aspect ratio and automatically select the appropriate interpolation algorithm:

def resize(img, target_w=None, target_h=None, interp=cv2.INTER_AREA):
    """
    智能缩放:支持固定尺寸、固定比例、保持宽高比
    interp插值方法建议:
        - 缩小用 cv2.INTER_AREA(区域插值,避免锯齿)  
        - 放大用 cv2.INTER_CUBIC(三次样条,清晰度高)  
        - 追求速度用 cv2.INTER_LINEAR(双线性)
    """
    h, w = img.shape[:2]
    
    # 自动计算缩放后尺寸
    if target_w and target_h:
        # 在给定的最大宽高下,保持比例
        scale = min(target_w / w, target_h / h)
        new_w, new_h = int(w * scale), int(h * scale)
    elif target_w:
        new_w = target_w
        new_h = int(h * (target_w / w))
    elif target_h:
        new_h = target_h
        new_w = int(w * (target_h / h))
    else:
        raise ValueError("❌ 必须指定 target_w 或 target_h 或两者")
    
    return cv2.resize(img, (new_w, new_h), interpolation=interp)

📐 Scale Interpolation Cheat Sheet:

ScenarioRecommended interpolation methodDescription
Reduce imageINTER_AREABased on the pixel area relationship, moiré patterns can be effectively avoided
Enlarge imageINTER_CUBICCubic spline interpolation, smoother edges
Live Video StreamingINTER_LINEARBilinear interpolation, the fastest

4.2 Image rotation: advanced version without cropping edges

Use directlycv2.getRotationMatrix2DAfter rotation, the default canvas size remains unchanged and the four corners of the image will be cropped. The following version will automatically calculate the new canvas size, retaining all pixels after rotation:

def rotate(img, angle, center=None, scale=1.0):
    """
    旋转图像,支持任意角度,且不裁剪图像边缘
    angle: 正数=逆时针,负数=顺时针
    """
    h, w = img.shape[:2]
    
    # 默认绕图像中心旋转
    if center is None:
        center = (w // 2, h // 2)
    
    # 获取旋转矩阵
    M = cv2.getRotationMatrix2D(center, angle, scale)
    
    # ----- 计算旋转后的新画布尺寸(不裁剪边缘的核心)-----
    cos = np.abs(M[0, 0])
    sin = np.abs(M[0, 1])
    new_w = int((h * sin) + (w * cos))
    new_h = int((h * cos) + (w * sin))
    
    # 调整矩阵的平移部分,把图像移到新画布中央
    M[0, 2] += (new_w // 2) - center[0]
    M[1, 2] += (new_h // 2) - center[1]
    # ------------------------------------------------
    
    # 应用变换,指定背景填充色(这里设为白色)
    return cv2.warpAffine(img, M, (new_w, new_h), borderValue=(255, 255, 255))

🔄 Rotation angle description: The angle in OpenCV is counterclockwise as the positive direction. If you want to get the effect of 30 degrees clockwise, pass inangle=-30That’s it.


4.3 Affine transformation: three points determine any parallel mapping

Affine transformations maintain the parallelism of straight lines (for example, a square becomes a parallelogram). We only need to specify 3 pairs of corresponding points on the original image and the target image, and OpenCV will automatically calculate the transformation matrix:

def affine_transform(img, src_pts, dst_pts, borderValue=(255, 255, 255)):
    """
    应用仿射变换
    src_pts / dst_pts : 形状为 (3,2) 的 numpy 数组,点格式 (x,y)
    """
    h, w = img.shape[:2]
    # 根据3对点计算2×3仿射矩阵
    M = cv2.getAffineTransform(src_pts.astype(np.float32), dst_pts.astype(np.float32))
    # 应用变换
    return cv2.warpAffine(img, M, (w, h), borderValue=borderValue)

# 使用示例:将左上角三个角点映射成其他位置
# src_pts = np.array([[0, 0], [w-1, 0], [0, h-1]], np.float32)
# dst_pts = np.array([[50, 50], [w-100, 0], [0, h-50]], np.float32)
# transformed = affine_transform(img, src_pts, dst_pts)

🎯 Practical Scenario:

  • Image correction: For example, a document shot at an angle can be "rightened" by specifying four corner points of the document (using four points to find perspective transformation) or three corner points (affine).
  • Data enhancement: When training a deep learning model, three pairs of points are randomly generated for affine transformation to increase sample diversity.

5. Practical gadget: write an image batch resizer in 10 lines of code

Encapsulate the frontload_imageresizesave_imageCombined, a lightweight batch processing script can be implemented:

import os
from tqdm import tqdm  # 进度条库,安装:pip install tqdm

def batch_resize(input_dir: str, output_dir: str, target_w=None, target_h=None, interp=cv2.INTER_AREA):
    """批量缩放文件夹内所有图片,保持宽高比"""
    # 创建输出文件夹
    os.makedirs(output_dir, exist_ok=True)
    
    # 收集所有常见格式图片
    img_exts = ['.jpg', '.jpeg', '.png', '.bmp', '.tiff']
    img_files = [f for f in os.listdir(input_dir) if os.path.splitext(f)[-1].lower() in img_exts]
    
    # 逐张处理
    for img_file in tqdm(img_files, desc="批量缩放中"):
        try:
            img = load_image(os.path.join(input_dir, img_file))
            resized = resize(img, target_w=target_w, target_h=target_h, interp=interp)
            save_image(resized, os.path.join(output_dir, img_file))
        except Exception as e:
            print(f"⚠️ 处理失败 {img_file}: {e}")

# 把 input 文件夹内所有图片缩放至宽度800像素并保存到 output_800w 文件夹
# batch_resize("input", "output_800w", target_w=800)

🧩 Extension ideas: You can easily extend this framework into "batch watermarking", "batch filter" or "batch format conversion", just replace the intermediate image processing functions.


Summary and next steps

This article takes you from scratch to master the most basic and commonly used operations of OpenCV: environment installation, image reading and writing, drawing annotation and geometric transformation. These APIs are the cornerstone of all subsequent advanced vision tasks (filtering, feature extraction, target detection, etc.). Be sure to deepen your understanding through actual code exercises.

The next step is to continue learning 图像增强与滤波 to learn how to denoise, sharpen, and perform more complex pixel-level processing on images.

💬 **Having a problem? ** You can view the official documentation OpenCV Documentation or execute it in the command linehelp(cv2.函数名)Get parameter description.