Process vs. Thread

When using a computer on a daily basis, you must have experienced this scenario: there are game updates hanging in the background, 3 programming documents and 1 online music page are open in the browser, and at the same time, you are still reading messages on WeChat - this is a typical multi-tasking parallel processing. Today we will dismantle the three core roles of multitasking: processes, threads, and coroutines, which are increasingly popular in high-concurrency scenarios. We will compare their characteristics, advantages and disadvantages, and applicable scenarios, and finally write a few minimalist codes in Python so that you can see their differences at a glance.


1. Common architecture for multitasking: Master‑Worker pattern

Regardless of whether the bottom layer uses multi-process, multi-thread or coroutine, the idea of ​​​​implementing multi-tasking in most systems is inseparable from Master‑Worker division of labor:

  • Master (Supervisor): Responsible for task reception, splitting, distribution, and sometimes overall coordination.
  • Worker: execute the task immediately after receiving it.
  • General configuration: one Master corresponds to multiple Workers.

Specific to different implementations:

  • Multiple processes: The main process acts as Master, and the child process acts as Worker.
  • Multi-threading: The main thread acts as Master and the sub-thread acts as Worker.
  • Coroutine: The event loop in the main thread (or a single process) acts as the Master, and the coroutine acts as the Worker.

2. Traditional duo: multi-process vs multi-thread

Multi-process mode

We can understand the process as an independent "workshop": each workshop has its own warehouse (memory space), tool library (file handle, network connection, etc.). The workshops are completely physically isolated. Unless a "conveyor belt" (inter-process communication, IPC) is specially built, you cannot take each other's things casually.

advantage

  1. Stability ceiling: If a certain sub-workshop (sub-process) explodes (collapses), it will only clean up its own stall, and the main workshop and other sub-workshops will not be affected at all. This is also the reason why the early architectures of Apache and Chrome browsers adopted pure multi-process.
  2. Complete utilization of multi-core CPU: Each workshop can exclusively use one CPU core to work without scheduling restrictions (Windows/Linux can do this).
  3. Absolute memory isolation: There is no need to worry about other workshops "stealing" or "contaminating" your data.

shortcoming

  1. High cost of workshop construction: Applying for independent warehouses and tool libraries requires time and system resources. The creation cost under Windows systems is particularly high, even several times to dozens of times that of Linux.
  2. High switching costs: When the workshop director (CPU core) moves from workshop A to workshop B, he must first put away all the tools and progress bars of A, and then take out and arrange those of B. This process is called context switching.
  3. IPC is more troublesome: To transfer files and data between workshops, a dedicated conveyor belt (pipeline, message queue, shared memory, etc.) must be used. If shared memory is used, a "safety door" (lock) must be added to prevent mistaking, and the code complexity will increase significantly.

Multi-threaded mode

Threads are "worker groups" in the workshop: All groups share the same warehouse and tool library, but each group has its own work table and progress bar (register, stack space). To communicate between groups, just pass the note directly (shared memory), no additional conveyor belt is needed.

advantage

  1. Fast team formation: There is no need to apply for an independent warehouse, just designate an area in the existing workshop as a work table, and the creation and switching costs are much lower than the process.
  2. Easy communication between groups: Passing notes (shared global variables, heap memory) is extremely efficient.
  3. Suitable for I/O waiting scenarios: For example, if a group is waiting for express delivery (waiting for network requests, waiting for disk reading and writing), the workshop director can immediately switch to another idle group to work, and the CPU will not be idle.

shortcoming

  1. Poor stability: A team accidentally broke the main power supply of the workshop (triggering illegal memory access), and the entire workshop (the entire process) would be powered off. This is also the reason why many early web servers did not dare to use pure multi-threading.
  2. Need to deal with the "warehouse grabbing" problem: multiple teams using the same screwdriver (modifying the same global variable) at the same time will mess up the data. This is called race condition. The solution is to add a "tool usage register" (lock) to standardize, but using too many locks will reduce efficiency and even cause "deadlock" (two groups each take the screwdriver that the other party needs).
  3. Special limitations of Python: There is a GIL (Global Interpreter Lock) in CPython (our most commonly used Python interpreter): only one group (thread) can work in the "main operating area" (executing Python bytecode) of the workshop at the same time. This means that multi-threading written in pure Python cannot truly utilize multi-core CPUs to process computing tasks in parallel.

3. Modern Server Compromises: Hybrid Mode

Pure multi-processing or pure multi-threading both have their own shortcomings, so mainstream server software has long used a hybrid model of "multi-process guarantee + multi-thread efficiency":

  • Apache: supports three types of multi-processing modules (MPM)
    • prefork: Pure multi-process, suitable for scenarios with extremely high stability requirements.
    • worker: Multi-process foundation, multiple threads are opened in each sub-process, balancing stability and resource utilization.
    • event: Optimize I/O waiting based on worker, resulting in higher performance.
  • IIS: Multi-threaded by default, but also supports process isolation mode.
  • Nginx: Goes further and directly uses the asynchronous model of Single Master + Multiple Worker processes + Event-driven. The performance crushes the traditional models of Apache and IIS in many scenarios.

4. Don’t let “switching workers” drag down efficiency: the pitfalls of task switching

Whether it is switching processes, threads, or coroutines to be discussed later, context switching itself has a cost:

  1. Save the progress bar and tool status of the current task.
  2. Prepare the progress bar and tool status of the next task.
  3. Put the next task on the CPU core and start execution.

If too many tasks are opened, the system may spend more than 80% of the time switching and only 20% of the time actually doing work. The CPU core utilization seems to be very high, but the actual business processing speed is very slow, and even "system suspended animation" occurs (the mouse and keyboard cannot move, but the background process is still running).


5. Choose the right person to do the right job: match the task type and plan

Before choosing a multitasking solution, first figure out whether your tasks are "CPU busy" or "Idle and waiting".

Computationally intensive tasks (CPU busy)

  • Features: Most of the time is spent computing on the CPU, almost no idle time. Typical examples: scientific computing, video encoding and decoding, cryptography hashing/encryption, machine learning training inference.
  • suggestion:
  • Number of tasks ≈ number of CPU cores (can be 1 to 2 more to handle sudden system scheduling).
  • Give priority to languages ​​such as C / C++ / Rust / Golang that have no global interpreter lock and are extremely efficient after compilation.
  • If you must use Python, choose multi-process (which can bypass GIL), or use C extension (for example, the bottom layers of NumPy and PyTorch are written in C and are not restricted by GIL).

I/O intensive tasks (many idle waits)

  • Features: Spend most of the time "idlely waiting": waiting for network requests to return, waiting for database reading and writing, and waiting for user input. CPU core utilization is very low (maybe only 10% to 30%). Typical examples: Web services, crawlers, database middleware, chat software backends.
  • suggestion:
  • The number of tasks can be appropriately increased (for example, 5 to 10 times the number of CPU cores, or even more).
  • These interpreted scripting languages such as Python/JavaScript are completely sufficient, and development efficiency is much more important than execution efficiency.
  • The modern trend is to use asynchronous I/O + coroutine (more on this later), which is more efficient and less expensive than multi-threading.

6. The new favorite in high-concurrency scenarios: asynchronous I/O + coroutines

Asynchronous I/O

Modern operating systems (Linux's epoll, Windows' IOCP, macOS's kqueue) provide efficient asynchronous I/O support: there is no need to open multiple threads/processes to wait for I/O, you only need to tell the operating system "Call me when this network request comes back", and then the main thread/single process can do other things. Typical representatives: Nginx, Node.js, Redis (the main process is single-threaded, but the performance is extremely high).

Coroutine

Coroutines can be understood as "lightweight to the extreme worker group": all coroutines are in the same thread**, and the scheduler (event loop) written by the program itself controls the switching, without the intervention of the operating system. How cheap is it to switch a coroutine? It's probably one thousandth or even one ten thousandth of switching threads - just saving and restoring a few pointers.

The coroutine syntax in Python isasync / await, the supporting standard library isasyncio, as well as third-party librariesgevent(More friendly to synchronous code).

Advantages

  1. Full concurrency: Tens of thousands or even hundreds of thousands** of coroutines can be easily opened in a single thread (multiple threads running up to a few thousand will crash due to insufficient stack space).
  2. Almost no switching costs.
  3. No need to deal with complex thread lock issues (because only one thread is executing the coroutine, and only one coroutine is running the code of the "main operation area" at the same time).

7. One table to complete the selection

ScenarioRecommended solutionCore reasons
Scientific computing, video transcodingMulti-process/C extensionBypass GIL, fully utilize multi-core
Simple background batch processingThread poolSimple implementation, sufficient resource utilization
Web services, high-concurrency crawlersCoroutines/asynchronous frameworks (FastAPI, aiohttp, Scrapy Asyncio)High concurrency, low overhead, and high development efficiency
Multi-tenant, high stability requirementsMulti-process (even multi-process + container)Complete isolation, the collapse of one tenant will not affect others

8. Python implementation example (minimalist version)

###Multiple processes

from multiprocessing import Process
import time

def worker(task_name: str, delay: int) -> None:
    print(f"[子进程] 开始执行 {task_name}")
    time.sleep(delay)   # 模拟计算或短 I/O
    print(f"[子进程] 完成 {task_name}")

if __name__ == '__main__':
    start_time = time.time()
    tasks = [("数学计算1", 2), ("数学计算2", 3), ("数学计算3", 1)]
    processes = []

    # 创建并启动子进程
    for name, d in tasks:
        p = Process(target=worker, args=(name, d))
        processes.append(p)
        p.start()

    # 等待所有子进程完成
    for p in processes:
        p.join()

    print(f"[主进程] 所有任务完成,总耗时 {time.time() - start_time:.2f}s")

The running result takes about 3 seconds in total (three sub-processes are executed in parallel).

Multithreading

from threading import Thread
from concurrent.futures import ThreadPoolExecutor
import time

def worker(task_name: str, delay: int) -> None:
    print(f"[子线程] 开始执行 {task_name}")
    time.sleep(delay)   # 模拟 I/O(time.sleep 会释放 GIL)
    print(f"[子线程] 完成 {task_name}")

if __name__ == '__main__':
    start_time = time.time()
    tasks = [("爬取网页1", 2), ("爬取网页2", 3), ("爬取网页3", 1)]

    # 方式1:直接创建线程(适合任务数少的场景)
    # threads = []
    # for name, d in tasks:
    #     t = Thread(target=worker, args=(name, d))
    #     threads.append(t)
    #     t.start()
    # for t in threads:
    #     t.join()

    # 方式2:使用线程池(推荐!自动管理线程生命周期,避免频繁创建销毁)
    with ThreadPoolExecutor(max_workers=4) as executor:
        executor.map(lambda x: worker(*x), tasks)

    print(f"[主线程] 所有任务完成,总耗时 {time.time() - start_time:.2f}s")

Because it is I/O intensive, the total time consumption is also about 3s (GIL istime.sleepwill be released).

Coroutine

import asyncio
import time

async def worker(task_name: str, delay: int) -> None:
    print(f"[协程] 开始执行 {task_name}")
    # 必须用 asyncio.sleep,不能用 time.sleep(否则会阻塞整个事件循环)
    await asyncio.sleep(delay)
    print(f"[协程] 完成 {task_name}")

async def main() -> None:
    start_time = time.time()
    tasks = [("API请求1", 2), ("API请求2", 3), ("API请求3", 1)]

    # asyncio.gather 会并行执行所有协程
    await asyncio.gather(*(worker(name, d) for name, d in tasks))

    print(f"[主协程] 所有任务完成,总耗时 {time.time() - start_time:.2f}s")

if __name__ == '__main__':
    asyncio.run(main())

The total time consumption is also about 3s, but the resource utilization is much higher than that of multi-threading.


9. Final summary

  1. First divide the task types: CPU-intensive multi-process/C extension, I/O-intensive coroutine/asynchronous framework.
  2. Don’t over-design: If it is just a simple background scheduled task, the thread pool is enough.
  3. Python Special Case: Never forget the existence of the GIL.
  4. Focus on modern trends: Asynchronous I/O and coroutines have become mainstream solutions for high-concurrency web services.

Choose the one that best suits your needs, rather than the one that looks "the most powerful". This is good architectural design.