Edge Computing: Detailed explanation of Raspberry Pi, mobile phone and edge AI deployment

📂 Stage: Stage 2 - Deep Learning Vision Basics (CNN) 🔗 Related chapters: Web 视觉应用 · 实战项目一:智能人脸考勤系统


Introduction

Edge computing is changing the way we use AI. It moves computing from distant cloud data centers to the devices closest to us - such as mobile phones, cameras, and Raspberry Pis. In this way, AI no longer relies on the network, but can make quick decisions locally.

This change is especially important for deep learning. Real-time tasks (such as face recognition, anomaly detection) can no longer tolerate hundreds of milliseconds of cloud latency, and high-definition videos are not suitable for all uploading to the cloud. More importantly, many scenarios involve private data (faces, medical images), and the law requires that the data cannot leave the device. Edge AI just solves these problems: low latency, privacy protection, bandwidth saving, and can run offline.

This article will take you through the core deployment links of edge AI: from hardware preparation and framework selection to model optimization and actual deployment architecture, allowing you to quickly get started deploying AI to edge devices.


1. What are the advantages of edge AI?

1.1 Cloud vs edge, understand with one table

DimensionCloud AIEdge AI
DelayUsually more than 500ms, network fluctuations have a great impactUsually within 100ms, local direct inference
PrivacyData must be uploaded to the cloudSensitive data is processed locally without leaving the device
BandwidthNeed to continuously transmit original images or audioOnly transmit results or a small amount of compressed abnormal data
ReliabilityWill be paralyzed if the network is disconnectedCan run independently even completely offline
CostCloud GPU, storage, and bandwidth costs accumulate over timeEdge device hardware is cheap, power consumption is low, and marginal cost is low

1.2 Edge AI is not a “replacement of the cloud”, but cloud-edge collaboration

A typical edge AI architecture can be divided into four layers, each with their own division of labor:

  1. Cloud training layer: Process massive data, do model pre-training, fine-tuning and update the global knowledge base.
  2. Edge gateway layer (optional): Aggregate data from multiple terminals and do some local caching and coordination.
  3. Edge device layer: This is the protagonist - Raspberry Pi, mobile phone, smart camera, etc., responsible for real-time reasoning.
  4. Data collection layer: Microphones, cameras, and sensors are only responsible for collecting raw data.

Below we will focus on the actual deployment of the edge device layer.


2. Raspberry Pi deployment practice: the most user-friendly edge platform

The Raspberry Pi is affordable, has a mature ecosystem, and can fully run Python + PyTorch, making it the best choice for getting started with edge AI.

2.1 Configure the environment with one click

It is recommended to use Raspberry Pi OS 64-bit (Bookworm version) for the best compatibility. Open a terminal and follow these steps:

# 1. 更新系统
sudo apt update && sudo apt upgrade -y

# 2. 安装 Python 虚拟环境需要的工具
sudo apt install python3-pip python3-dev python3-venv -y

# 3. 创建并激活一个专属虚拟环境
python3 -m venv edge_ai_env
source edge_ai_env/bin/activate

# 4. 安装推理工具链(树莓派没有独立显卡,全部用 CPU 版本)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install opencv-python-headless pillow numpy
pip install psutil   # 用来监控资源

💡 opencv-python-headlessIt eliminates the dependencies related to the graphical interface and is more refreshing on the screen-less Raspberry Pi.

2.2 Use lightweight model to run inference

MobileNetV2 is a visual model specially designed for mobile and edge devices. We directly use it to demonstrate a complete reasoning process:

import torch
import torchvision.models as models
import torchvision.transforms as transforms
from PIL import Image
import time

class RaspberryPiMobileNet:
    def __init__(self):
        # 加载预训练的 MobileNetV2,切换到 CPU 并设为评估模式
        self.device = torch.device("cpu")
        self.model = models.mobilenet_v2(pretrained=True).to(self.device).eval()
        
        # 图像预处理:与训练 ImageNet 时的操作保持一致
        self.transform = transforms.Compose([
            transforms.Resize(256),
            transforms.CenterCrop(224),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])
        ])
        
        # 读取 ImageNet 的类别标签
        with open("imagenet_labels.txt", "r") as f:
            self.labels = [line.strip() for line in f.readlines()]

    def predict(self, image_path: str) -> dict:
        # 读取图像并预处理
        img = Image.open(image_path).convert("RGB")
        input_tensor = self.transform(img).unsqueeze(0).to(self.device)
        
        # 计时推理(不计算梯度,省内存)
        start = time.time()
        with torch.no_grad():
            outputs = self.model(input_tensor)
        inference_ms = (time.time() - start) * 1000
        
        # 取出概率最高的3个类别
        probs = torch.nn.functional.softmax(outputs[0], dim=0)
        top3_probs, top3_indices = torch.topk(probs, 3)
        
        return {
            "top3": [(self.labels[i], round(p.item(), 3))
                     for i, p in zip(top3_indices, top3_probs)],
            "inference_ms": round(inference_ms, 2)
        }

if __name__ == "__main__":
    detector = RaspberryPiMobileNet()
    result = detector.predict("test_dog.jpg")
    print(f"预测 Top3: {result['top3']}")
    print(f"推理耗时: {result['inference_ms']}ms")

⚠️ Remember to download before usingimagenet_labels.txtfile (containing the English names of 1000 categories), otherwise the script will report an error.

On Raspberry Pi 4B, this code can usually complete an inference within 50~100 milliseconds, which is sufficient for many real-time scenarios.


3. TensorFlow Lite: standard for inference on mobile phones and embedded devices

PyTorch is flexible and powerful, but if you want to plug the model into an Android phone, embedded board or even a microcontroller, TensorFlow Lite is a more mature choice. It natively supports INT8/FP16 quantization, and can also call hardware accelerators such as GPU and NPU, making the inference speed very fast.

3.1 Conversion process from PyTorch to TFLite

The PyTorch model must first be transferred through ONNX and then converted into TFLite. Here is an automated conversion function:

import torch
import onnx
import tensorflow as tf
import tf2onnx

def pytorch_to_tflite(pytorch_model_path: str, output_tflite_path: str):
    """
    将 PyTorch 模型转为 TensorFlow Lite 格式(带动态量化)
    """
    # 1. 导出 ONNX
    dummy_input = torch.randn(1, 3, 224, 224)
    pytorch_model = models.mobilenet_v2(pretrained=True).eval()
    torch.onnx.export(
        pytorch_model, dummy_input, "temp.onnx",
        export_params=True, opset_version=12,
        do_constant_folding=True
    )
    
    # 2. 校验 ONNX 模型
    onnx_model = onnx.load("temp.onnx")
    onnx.checker.check_model(onnx_model)
    
    # 3. 转换为 TFLite 并启用量化优化
    converter = tf.lite.TFLiteConverter.from_onnx("temp.onnx")
    converter.optimizations = [tf.lite.Optimize.DEFAULT]  # 动态量化
    tflite_model = converter.convert()
    
    # 4. 保存
    with open(output_tflite_path, "wb") as f:
        f.write(tflite_model)
    
    print(f"转换完成,TFLite 模型已保存至:{output_tflite_path}")

converted.tfliteThe file size is usually reduced to 1/4 of the original size, and the inference speed is increased by 2 to 4 times, making it very suitable for mobile phone use.


4. Three key points for edge AI performance optimization

Edge devices have limited computing power, memory, and power consumption, and must be proactively optimized to achieve good results.

4.1 Optimization at the model level (the most obvious effect)

MethodEffectRecommended Tools
Choose lightweight architectureDirectly reduce the amount of parameters and calculations (such as MobileNet, EfficientNet-Lite)PyTorch Hub / TensorFlow Hub
QuantificationConvert FP32 to INT8 or FP16, the model memory usage is reduced by about 75%, and the inference is accelerated by 2~4 timesTFLite Converter / PyTorch quantification tool
PruningCut off unimportant weight connections, and the number of parameters can be reduced by more than 50%PyTorch Pruning / TensorFlow Model Optimization

4.2 Hardware and runtime tips

  • Exclusively for Raspberry Pi:

  • Enable the NEON instruction set when compiling OpenCV / PyTorch to leverage the parallel capabilities of the ARM CPU.

  • Set CPU toperformancemode to avoid dynamic underclocking.

  • Mobile only:

  • Call TFLite's NNAPI under Android and Core ML under iOS to allow NPU to participate in acceleration.

  • It has been measured that the inference speed of some models on NPU can be increased by more than 5 times.

  • GENERAL TIPS:

  • Be sure to turn off gradient calculation during inference (with torch.no_grad())。

  • Large file models are loaded using memory mapping to reduce startup memory usage.

  • Image preprocessing and model inference can be split into different threads to create pipelines.


5. What else should be considered during actual deployment?

5.1 How to choose the deployment architecture?

Depending on the scenario, you can choose the following three typical methods:

  • Pure Edge Deployment: All inference is done on the device, suitable for privacy-sensitive scenarios such as home security cameras.
  • Edge-Cloud Collaboration: The edge performs preliminary screening and sends suspicious results to the cloud for detailed analysis. For example, in industrial quality inspection, suspected defects are first looked for locally, and the final judgment is made in the cloud.
  • Edge Caching: If the repetition rate of recognition tasks is high (such as smart shelves in shopping malls), the recognition results of popular products can be cached to greatly reduce the amount of calculation.

5.2 What indicators should be monitored after going online?

Deployment is just the beginning, you still need to continue to observe:

  • Inference delay: The time consumption of a single request, usually measured by P50 and P99
  • Throughput: How many frames or requests can be processed per second
  • Resource usage: CPU, memory, storage usage
  • Device Temperature: Plastic-cased devices such as the Raspberry Pi are prone to heat accumulation. Overheating will lead to frequency reduction and inference delays will suddenly soar.

Edge computing is the last mile of AI implementation. It is recommended to start with a small visual recognition project using **Raspberry Pi + PyTorch + TFLite** (such as identifying your own pets), familiarize yourself with the entire process of packaging, deployment, and monitoring, and then delve into the design of hardware acceleration and cloud-edge collaboration.

Summarize

Edge AI is not a “shrunk version” of cloud AI, but a practical way to integrate AI into the real world. It extends intelligence from data centers to small devices around us, making AI truly real-time, private and available offline.

For AI engineers, mastering the three capabilities of model lightweight + edge deployment + performance tuning will capture the core competitiveness of popular tracks such as the Internet of Things, mobile AI, and autonomous driving. Now, let’s start by lighting up a Raspberry Pi!