语义分割:像素级图像理解与U-Net架构详解

引言

语义分割是计算机视觉中的重要任务,旨在为图像中的每个像素分配一个语义标签。与目标检测不同,语义分割不仅识别图像中包含的对象,还精确地标记出每个对象的空间位置和轮廓。本文将深入探讨语义分割的核心概念、经典架构和实际应用。

📂 所属阶段:第二阶段 — 深度学习视觉基础(CNN 篇)
🔗 相关章节:YOLO 家族实战 · 关键点检测 (Keypoints)


1. 语义分割基础概念

1.1 语义分割任务定义

语义分割是将图像中的每个像素分类到预定义类别中的任务。

"""
语义分割任务定义:

输入:图像 I ∈ R^(H×W×C)
输出:分割图 S ∈ R^(H×W×N),其中N为类别数
目标:为每个像素 (i,j) 分配类别标签 c ∈ {1,2,...,N}

与相关任务的区别:
- 图像分类:整个图像一个标签
- 目标检测:对象级别定位
- 实例分割:区分同一类别的不同实例
- 语义分割:像素级别分类(同类不区分实例)
"""

def semantic_segmentation_tasks():
    """
    语义分割相关任务对比
    """
    tasks = {
        "Image Classification": "图像 → 类别",
        "Object Detection": "图像 → [类别, 边界框]",
        "Semantic Segmentation": "图像 → [像素 × 类别]",
        "Instance Segmentation": "图像 → [像素 × (类别, 实例ID)]",
        "Panoptic Segmentation": "图像 → [像素 × (类别, 实例ID)] (区分thing/stuff)"
    }
    
    print("计算机视觉分割任务对比:")
    for task, desc in tasks.items():
        print(f"• {task}: {desc}")

semantic_segmentation_tasks()

1.2 语义分割应用场景

def segmentation_applications():
    """
    语义分割应用领域
    """
    applications = {
        "Medical Imaging": "器官分割、肿瘤检测、病理分析",
        "Autonomous Driving": "道路分割、车道线检测、障碍物识别",
        "Remote Sensing": "土地利用分类、城市规划、环境监测",
        "Agriculture": "作物监测、病虫害检测、产量预测",
        "Robotics": "环境理解、导航、抓取定位",
        "Fashion": "服装分割、虚拟试衣、材质识别",
        "Video Analysis": "视频理解、动作分割、场景分析"
    }
    
    print("语义分割主要应用领域:")
    for domain, usage in applications.items():
        print(f"• {domain}: {usage}")

segmentation_applications()

2. 语义分割经典架构

2.1 FCN (Fully Convolutional Networks)

FCN是语义分割的开山之作,首次实现了端到端的像素级预测。

"""
FCN (Fully Convolutional Networks) - 2015

贡献:
1. 全卷积网络:移除全连接层
2. 反卷积层:实现上采样
3. 跳跃连接:融合多尺度特征

架构特点:
- Encoder: VGG特征提取
- Decoder: 反卷积上采样
- Skip Connections: 融合低层细节信息
"""

import torch
import torch.nn as nn
import torch.nn.functional as F

class FCN(nn.Module):
    """
    FCN实现 - 全卷积网络
    """
    def __init__(self, num_classes=21, backbone='vgg16'):
        super(FCN, self).__init__()
        
        # 使用VGG16作为backbone
        if backbone == 'vgg16':
            vgg16 = torch.hub.load('pytorch/vision:v0.10.0', 'vgg16', pretrained=True)
            
            # Encoder部分 - 特征提取
            self.features = vgg16.features
            
            # FCN32s - 最后一个池化后的特征上采样32倍
            self.fcn32s = nn.Sequential(
                nn.Conv2d(512, 4096, kernel_size=7, padding=0),
                nn.ReLU(inplace=True),
                nn.Dropout2d(),
                nn.Conv2d(4096, 4096, kernel_size=1),
                nn.ReLU(inplace=True),
                nn.Dropout2d(),
                nn.Conv2d(4096, num_classes, kernel_size=1)
            )
            
            # FCN16s - 融合pool4特征
            self.score_pool4 = nn.Conv2d(512, num_classes, kernel_size=1)
            self.upscore2 = nn.ConvTranspose2d(
                num_classes, num_classes, kernel_size=4, stride=2, bias=False
            )
            
            # FCN8s - 融合pool3特征
            self.score_pool3 = nn.Conv2d(256, num_classes, kernel_size=1)
            self.upscore8 = nn.ConvTranspose2d(
                num_classes, num_classes, kernel_size=16, stride=8, bias=False
            )
    
    def forward(self, x):
        # 提取特征
        pool3 = self.features[:17](x)  # pool3特征
        pool4 = self.features[17:24](pool3)  # pool4特征
        pool5 = self.features[24:](pool4)  # pool5特征
        
        # FCN32s
        score = self.fcn32s(pool5)
        upscore32 = F.interpolate(score, size=x.size()[2:], mode='bilinear', align_corners=False)
        
        # FCN16s
        score_pool4 = self.score_pool4(pool4)
        upscore2 = self.upscore2(score)
        fuse_pool4 = upscore2 + score_pool4
        upscore16 = F.interpolate(fuse_pool4, size=x.size()[2:], mode='bilinear', align_corners=False)
        
        # FCN8s
        score_pool3 = self.score_pool3(pool3)
        upscore8 = self.upscore8(fuse_pool4)
        fuse_pool3 = upscore8 + score_pool3
        upscore8_out = F.interpolate(fuse_pool3, size=x.size()[2:], mode='bilinear', align_corners=False)
        
        return upscore8_out

def fcn_architecture_explanation():
    """
    FCN架构解析
    """
    print("FCN架构解析:")
    print("1. 全卷积设计: 替换FC层为卷积层,支持任意尺寸输入")
    print("2. 反卷积上采样: 逐步恢复空间分辨率")
    print("3. 跳跃连接: 融合低层细节和高层语义信息")
    print("4. 多尺度融合: FCN32s, FCN16s, FCN8s不同版本")

fcn_architecture_explanation()

2.2 U-Net架构详解

U-Net是医学图像分割的经典架构,具有优秀的性能。

class UNet(nn.Module):
    """
    U-Net架构实现 - 用于生物医学图像分割
    """
    def __init__(self, in_channels=3, out_channels=1, init_features=32):
        super(UNet, self).__init__()
        
        features = init_features
        # 编码器 (Contracting Path)
        self.encoder1 = UNet._block(in_channels, features, name="enc1")
        self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.encoder2 = UNet._block(features, features * 2, name="enc2")
        self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.encoder3 = UNet._block(features * 2, features * 4, name="enc3")
        self.pool3 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.encoder4 = UNet._block(features * 4, features * 8, name="enc4")
        self.pool4 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        # 瓶颈层
        self.bottleneck = UNet._block(features * 8, features * 16, name="bottleneck")
        
        # 解码器 (Expanding Path)
        self.upconv4 = nn.ConvTranspose2d(
            features * 16, features * 8, kernel_size=2, stride=2
        )
        self.decoder4 = UNet._block((features * 8) * 2, features * 8, name="dec4")
        self.upconv3 = nn.ConvTranspose2d(
            features * 8, features * 4, kernel_size=2, stride=2
        )
        self.decoder3 = UNet._block((features * 4) * 2, features * 4, name="dec3")
        self.upconv2 = nn.ConvTranspose2d(
            features * 4, features * 2, kernel_size=2, stride=2
        )
        self.decoder2 = UNet._block((features * 2) * 2, features * 2, name="dec2")
        self.upconv1 = nn.ConvTranspose2d(
            features * 2, features, kernel_size=2, stride=2
        )
        self.decoder1 = UNet._block(features * 2, features, name="dec1")
        
        # 输出层
        self.outconv = nn.Conv2d(in_channels=features, out_channels=out_channels, kernel_size=1)
    
    def forward(self, x):
        # 编码器路径
        enc1 = self.encoder1(x)
        enc2 = self.encoder2(self.pool1(enc1))
        enc3 = self.encoder3(self.pool2(enc2))
        enc4 = self.encoder4(self.pool3(enc3))
        
        # 瓶颈层
        bottleneck = self.bottleneck(self.pool4(enc4))
        
        # 解码器路径
        dec4 = self.upconv4(bottleneck)
        dec4 = torch.cat((dec4, enc4), dim=1)
        dec4 = self.decoder4(dec4)
        
        dec3 = self.upconv3(dec4)
        dec3 = torch.cat((dec3, enc3), dim=1)
        dec3 = self.decoder3(dec3)
        
        dec2 = self.upconv2(dec3)
        dec2 = torch.cat((dec2, enc2), dim=1)
        dec2 = self.decoder2(dec2)
        
        dec1 = self.upconv1(dec2)
        dec1 = torch.cat((dec1, enc1), dim=1)
        dec1 = self.decoder1(dec1)
        
        return self.outconv(dec1)
    
    @staticmethod
    def _block(in_channels, features, name):
        """
        U-Net的基本块:两个3x3卷积 + ReLU
        """
        return nn.Sequential(
            nn.Conv2d(
                in_channels=in_channels,
                out_channels=features,
                kernel_size=3,
                padding=1,
                bias=False,
            ),
            nn.BatchNorm2d(num_features=features),
            nn.ReLU(inplace=True),
            nn.Conv2d(
                in_channels=features,
                out_channels=features,
                kernel_size=3,
                padding=1,
                bias=False,
            ),
            nn.BatchNorm2d(num_features=features),
            nn.ReLU(inplace=True),
        )

def unet_architecture_analysis():
    """
    U-Net架构分析
    """
    print("U-Net架构特点:")
    print("1. U型结构: 编码器-解码器对称设计")
    print("2. 跳跃连接: 连接编码器和解码器对应层")
    print("3. 特征融合: 拼接操作保留细节信息")
    print("4. 逐级上采样: 渐进恢复空间分辨率")
    print("5. 批归一化: 稳定训练过程")

unet_architecture_analysis()

2.3 DeepLab系列

DeepLab系列通过空洞卷积解决了分割中的多尺度问题。

class ASPP(nn.Module):
    """
    ASPP (Atrous Spatial Pyramid Pooling) - DeepLab核心组件
    """
    def __init__(self, in_channels, out_channels, atrous_rates=[6, 12, 18]):
        super(ASPP, self).__init__()
        
        modules = []
        modules.append(nn.Sequential(
            nn.Conv2d(in_channels, out_channels, 1, bias=False),
            nn.BatchNorm2d(out_channels),
            nn.ReLU()
        ))
        
        # 不同空洞率的卷积
        for rate in atrous_rates:
            modules.append(
                nn.Sequential(
                    nn.Conv2d(in_channels, out_channels, 3, padding=rate, 
                             dilation=rate, bias=False),
                    nn.BatchNorm2d(out_channels),
                    nn.ReLU()
                )
            )
        
        # 全局平均池化
        modules.append(
            nn.Sequential(
                nn.AdaptiveAvgPool2d(1),
                nn.Conv2d(in_channels, out_channels, 1, bias=False),
                nn.BatchNorm2d(out_channels),
                nn.ReLU()
            )
        )
        
        self.convs = nn.ModuleList(modules)
        
        # 最终投影
        self.project = nn.Sequential(
            nn.Conv2d(len(self.convs) * out_channels, out_channels, 1, bias=False),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(),
            nn.Dropout(0.5)
        )
    
    def forward(self, x):
        res = []
        for conv in self.convs:
            res.append(conv(x))
        
        # 上采样全局池化结果
        size = res[-1].size()[2:]
        res[-1] = F.interpolate(res[-1], size=size, mode='bilinear', align_corners=False)
        
        # 拼接所有特征
        res = torch.cat(res, dim=1)
        return self.project(res)

class DeepLabV3(nn.Module):
    """
    DeepLabV3实现
    """
    def __init__(self, num_classes=21, backbone='resnet50'):
        super(DeepLabV3, self).__init__()
        
        # 使用预训练的ResNet作为backbone
        if backbone == 'resnet50':
            resnet = torch.hub.load('pytorch/vision:v0.10.0', 'resnet50', pretrained=True)
            
            # 修改第一层以适应空洞卷积
            self.layer1 = resnet.layer1
            self.layer2 = resnet.layer2
            self.layer3 = resnet.layer3
            self.layer4 = resnet.layer4
            
            # 移除最后的池化层
            self.backbone = nn.Sequential(
                resnet.conv1,
                resnet.bn1,
                resnet.relu,
                resnet.maxpool,
                self.layer1,
                self.layer2,
                self.layer3,
                self.layer4
            )
        
        # ASPP模块
        self.aspp = ASPP(2048, 256)
        
        # 低级特征融合
        self.low_level_conv = nn.Sequential(
            nn.Conv2d(256, 48, 1, bias=False),
            nn.BatchNorm2d(48),
            nn.ReLU()
        )
        
        # 最终预测
        self.classifier = nn.Sequential(
            nn.Conv2d(304, 256, 3, padding=1, bias=False),  # 256+48=304
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.Conv2d(256, num_classes, 1)
        )
    
    def forward(self, x):
        # 特征提取
        low_level_feat = self.layer1(x)  # 低级特征
        x = self.layer2(low_level_feat)
        x = self.layer3(x)
        x = self.layer4(x)  # 高级特征
        
        # ASPP处理
        x = self.aspp(x)
        
        # 上采样
        x = F.interpolate(x, size=low_level_feat.shape[2:], mode='bilinear', align_corners=False)
        
        # 低级特征融合
        low_level_feat = self.low_level_conv(low_level_feat)
        x = torch.cat([x, low_level_feat], dim=1)
        
        # 最终预测
        x = self.classifier(x)
        x = F.interpolate(x, size=(512, 512), mode='bilinear', align_corners=False)
        
        return x

def deeplab_features():
    """
    DeepLab系列特性
    """
    features = {
        "ASPP": "多尺度特征提取,捕获不同感受野",
        "Atrous Convolution": "空洞卷积,扩大感受野不降低分辨率", 
        "Multi-grid": "多网格策略,优化特征提取",
        "CRF": "条件随机场,细化分割边界"
    }
    
    print("DeepLab系列核心特性:")
    for feature, desc in features.items():
        print(f"• {feature}: {desc}")

deeplab_features()

3. 语义分割损失函数

3.1 像素级损失函数

def pixel_level_losses():
    """
    语义分割常用损失函数
    """
    losses = {
        "Cross Entropy Loss": "标准分类损失,适用于平衡数据集",
        "Dice Loss": "基于Dice系数,适用于前景稀疏数据集",
        "Focal Loss": "处理类别不平衡,关注难分样本",
        "Lovász Loss": "直接优化IoU,平滑近似",
        "Boundary Loss": "关注边界区域,提高边界精度"
    }
    
    print("语义分割损失函数:")
    for loss, desc in losses.items():
        print(f"• {loss}: {desc}")

pixel_level_losses()

class DiceLoss(nn.Module):
    """
    Dice损失函数 - 用于处理类别不平衡
    """
    def __init__(self, smooth=1e-6):
        super(DiceLoss, self).__init__()
        self.smooth = smooth
    
    def forward(self, inputs, targets):
        # 将预测和目标展平
        inputs = torch.sigmoid(inputs).view(-1)
        targets = targets.view(-1)
        
        intersection = (inputs * targets).sum()
        dice = (2.*intersection + self.smooth) / (inputs.sum() + targets.sum() + self.smooth)
        
        return 1 - dice

class FocalLoss(nn.Module):
    """
    Focal损失函数 - 解决类别不平衡问题
    """
    def __init__(self, alpha=1, gamma=2):
        super(FocalLoss, self).__init__()
        self.alpha = alpha
        self.gamma = gamma
    
    def forward(self, inputs, targets):
        ce_loss = F.cross_entropy(inputs, targets, reduction='none')
        pt = torch.exp(-ce_loss)
        focal_loss = self.alpha * (1-pt)**self.gamma * ce_loss
        return focal_loss.mean()

def combined_loss_function():
    """
    组合损失函数示例
    """
    print("组合损失函数示例:")
    print("""
class CombinedLoss(nn.Module):
    def __init__(self, weight_ce=1.0, weight_dice=1.0):
        super(CombinedLoss, self).__init__()
        self.ce_loss = nn.CrossEntropyLoss()
        self.dice_loss = DiceLoss()
        self.weight_ce = weight_ce
        self.weight_dice = weight_dice
    
    def forward(self, inputs, targets):
        ce = self.ce_loss(inputs, targets)
        dice = self.dice_loss(inputs, targets)
        return self.weight_ce * ce + self.weight_dice * dice
""")

4. 数据预处理与增强

4.1 分割专用数据增强

def segmentation_data_augmentation():
    """
    语义分割专用数据增强
    """
    import albumentations as A
    from albumentations.pytorch import ToTensorV2
    
    # 分割任务专用增强
    transform = A.Compose([
        A.Resize(512, 512),
        A.HorizontalFlip(p=0.5),
        A.VerticalFlip(p=0.1),
        A.RandomRotate90(p=0.5),
        A.Transpose(p=0.5),
        A.ShiftScaleRotate(shift_limit=0.0625, scale_limit=0.2, rotate_limit=45, p=0.5),
        A.OneOf([
            A.OpticalDistortion(p=0.3),
            A.GridDistortion(p=0.1),
            A.PiecewiseAffine(p=0.3),
        ], p=0.3),
        A.OneOf([
            A.CLAHE(clip_limit=2),
            A.Sharpen(),
            A.Emboss(),
            A.RandomBrightnessContrast(),            
        ], p=0.3),
        A.HueSaturationValue(p=0.3),
        A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
        ToTensorV2(),
    ])
    
    print("语义分割数据增强策略:")
    print("• 几何变换: 旋转、翻转、缩放(需同步变换标签)")
    print("• 颜色变换: 亮度、对比度、饱和度调整")
    print("• 空间变换: 扭曲、仿射变换")
    print("• 特殊变换: 弹性变形、网格变形")

segmentation_data_augmentation()

4.2 自定义分割数据集

import os
from PIL import Image
from torch.utils.data import Dataset

class SegmentationDataset(Dataset):
    """
    语义分割数据集类
    """
    def __init__(self, image_dir, mask_dir, transform=None):
        self.image_dir = image_dir
        self.mask_dir = mask_dir
        self.transform = transform
        self.images = os.listdir(image_dir)
    
    def __len__(self):
        return len(self.images)
    
    def __getitem__(self, index):
        img_path = os.path.join(self.image_dir, self.images[index])
        mask_path = os.path.join(self.mask_dir, self.images[index])
        
        image = np.array(Image.open(img_path).convert("RGB"))
        mask = np.array(Image.open(mask_path))
        
        if self.transform is not None:
            augmentations = self.transform(image=image, mask=mask)
            image = augmentations["image"]
            mask = augmentations["mask"]
        
        return image, mask

def dataset_best_practices():
    """
    数据集最佳实践
    """
    practices = [
        "图像和掩码同步增强变换",
        "类别平衡采样策略",
        "多尺度训练策略",
        "在线数据增强",
        "验证集严格分离"
    ]
    
    print("语义分割数据集最佳实践:")
    for i, practice in enumerate(practices, 1):
        print(f"{i}. {practice}")

dataset_best_practices()

5. 模型训练与评估

5.1 训练流程

def segmentation_training_pipeline():
    """
    语义分割训练流程
    """
    print("语义分割训练流程:")
    print("""
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from tqdm import tqdm

def train_model(model, train_loader, val_loader, num_epochs=100):
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model.to(device)
    
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)
    scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', patience=5)
    
    for epoch in range(num_epochs):
        # 训练阶段
        model.train()
        train_loss = 0.0
        for images, masks in tqdm(train_loader):
            images, masks = images.to(device), masks.to(device)
            
            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs, masks)
            loss.backward()
            optimizer.step()
            
            train_loss += loss.item()
        
        # 验证阶段
        model.eval()
        val_loss = 0.0
        with torch.no_grad():
            for images, masks in val_loader:
                images, masks = images.to(device), masks.to(device)
                outputs = model(images)
                loss = criterion(outputs, masks)
                val_loss += loss.item()
        
        avg_train_loss = train_loss / len(train_loader)
        avg_val_loss = val_loss / len(val_loader)
        
        scheduler.step(avg_val_loss)
        
        print(f'Epoch [{epoch+1}/{num_epochs}], '
              f'Train Loss: {avg_train_loss:.4f}, '
              f'Val Loss: {avg_val_loss:.4f}')
    
    return model
""")

segmentation_training_pipeline()

5.2 评估指标

def segmentation_evaluation_metrics():
    """
    语义分割评估指标
    """
    metrics = {
        "Pixel Accuracy": "正确分类像素数 / 总像素数",
        "Mean Accuracy": "各类别准确率的平均值",
        "Mean IoU": "各类别IoU的平均值",
        "Frequency Weighted IoU": "频率加权的IoU平均值",
        "Dice Coefficient": "2 * 交集 / (预测+标签)",
        "Precision/Recall/F1": "针对特定类别的指标"
    }
    
    print("语义分割评估指标:")
    for metric, formula in metrics.items():
        print(f"• {metric}: {formula}")

segmentation_evaluation_metrics()

def compute_miou(predicted, target, num_classes):
    """
    计算Mean IoU
    """
    ious = []
    for cls in range(num_classes):
        # 计算该类别的IoU
        pred_inds = predicted == cls
        target_inds = target == cls
        
        intersection = (pred_inds[target_inds]).long().sum().item()
        union = (pred_inds | target_inds).long().sum().item()
        
        if union == 0:
            iou = float('nan')  # 如果该类别不存在,跳过
        else:
            iou = float(intersection) / float(union)
        ious.append(iou)
    
    return np.nanmean(ious)  # 返回平均IoU

6. 现代分割架构

6.1 Transformer在分割中的应用

def vision_transformer_segmentation():
    """
    Vision Transformer在分割中的应用
    """
    print("Vision Transformer分割架构:")
    print("• SegFormer: 混合CNN-Transformer架构")
    print("• SETR: 纯Transformer分割")
    print("• Swin-Unet: 基于Swin Transformer的U-Net")
    print("• TransUNet: Transformer与CNN结合")
    
    # SegFormer示例
    print("""
class SegFormerHead(nn.Module):
    def __init__(self, in_channels, num_classes, embed_dim=256):
        super().__init__()
        self.linear_fuse = nn.Conv2d(sum(in_channels), embed_dim, 1)
        self.dropout = nn.Dropout2d(0.1)
        self.linear_pred = nn.Conv2d(embed_dim, num_classes, 1)
    
    def forward(self, inputs):
        # 特征融合
        x = torch.cat([F.interpolate(input, size=inputs[0].size()[2:], 
                                   mode='bilinear', align_corners=False) 
                      for input in inputs], dim=1)
        x = self.linear_fuse(x)
        x = self.dropout(x)
        x = self.linear_pred(x)
        return x
""")

vision_transformer_segmentation()

6.2 实时分割架构

def real_time_segmentation_architectures():
    """
    实时语义分割架构
    """
    architectures = {
        "BiSeNet": "双边分割网络,速度与精度平衡",
        "DFANet": "深度特征聚合网络,高效轻量",
        "Fast-SCNN": "快速语义分割CNN,移动端友好", 
        "ESPNet": "高效空间金字塔网络",
        "LiteSeg": "轻量级分割网络"
    }
    
    print("实时语义分割架构:")
    for arch, desc in architectures.items():
        print(f"• {arch}: {desc}")

real_time_segmentation_architectures()

相关教程

语义分割是深度学习中的高级任务,建议先掌握基础的CNN和分类知识。U-Net是很好的起点,DeepLab系列提供了更先进的技术思路。实践时可以从简单数据集开始。

7. 总结

语义分割是计算机视觉的重要分支:

经典架构:

  1. FCN: 开创全卷积网络
  2. U-Net: 跳跃连接设计
  3. DeepLab: 空洞卷积应用

关键技术:

  • 跳跃连接:融合多尺度特征
  • 空洞卷积:扩大感受野
  • 注意力机制:关注重要区域

💡 重要提醒:语义分割在医学影像、自动驾驶等领域有重要应用。掌握U-Net和DeepLab等经典架构是进入该领域的关键。

🔗 扩展阅读