Process vs. Thread
When using a computer on a daily basis, you must have experienced this scenario: there are game updates hanging in the background, 3 programming documents and 1 online music page are open in the browser, and at the same time, you are still reading messages on WeChat - this is a typical multi-tasking parallel processing. Today we will dismantle the three core roles of multitasking: processes, threads, and coroutines, which are increasingly popular in high-concurrency scenarios. We will compare their characteristics, advantages and disadvantages, and applicable scenarios, and finally write a few minimalist codes in Python so that you can see their differences at a glance.
1. Common architecture for multitasking: Master‑Worker pattern
Regardless of whether the bottom layer uses multi-process, multi-thread or coroutine, the idea of implementing multi-tasking in most systems is inseparable from Master‑Worker division of labor:
- Master (Supervisor): Responsible for task reception, splitting, distribution, and sometimes overall coordination.
- Worker: execute the task immediately after receiving it.
- General configuration: one Master corresponds to multiple Workers.
Specific to different implementations:
- Multiple processes: The main process acts as Master, and the child process acts as Worker.
- Multi-threading: The main thread acts as Master and the sub-thread acts as Worker.
- Coroutine: The event loop in the main thread (or a single process) acts as the Master, and the coroutine acts as the Worker.
2. Traditional duo: multi-process vs multi-thread
Multi-process mode
We can understand the process as an independent "workshop": each workshop has its own warehouse (memory space), tool library (file handle, network connection, etc.). The workshops are completely physically isolated. Unless a "conveyor belt" (inter-process communication, IPC) is specially built, you cannot take each other's things casually.
advantage
- Stability ceiling: If a certain sub-workshop (sub-process) explodes (collapses), it will only clean up its own stall, and the main workshop and other sub-workshops will not be affected at all. This is also the reason why the early architectures of Apache and Chrome browsers adopted pure multi-process.
- Complete utilization of multi-core CPU: Each workshop can exclusively use one CPU core to work without scheduling restrictions (Windows/Linux can do this).
- Absolute memory isolation: There is no need to worry about other workshops "stealing" or "contaminating" your data.
shortcoming
- High cost of workshop construction: Applying for independent warehouses and tool libraries requires time and system resources. The creation cost under Windows systems is particularly high, even several times to dozens of times that of Linux.
- High switching costs: When the workshop director (CPU core) moves from workshop A to workshop B, he must first put away all the tools and progress bars of A, and then take out and arrange those of B. This process is called context switching.
- IPC is more troublesome: To transfer files and data between workshops, a dedicated conveyor belt (pipeline, message queue, shared memory, etc.) must be used. If shared memory is used, a "safety door" (lock) must be added to prevent mistaking, and the code complexity will increase significantly.
Multi-threaded mode
Threads are "worker groups" in the workshop: All groups share the same warehouse and tool library, but each group has its own work table and progress bar (register, stack space). To communicate between groups, just pass the note directly (shared memory), no additional conveyor belt is needed.
advantage
- Fast team formation: There is no need to apply for an independent warehouse, just designate an area in the existing workshop as a work table, and the creation and switching costs are much lower than the process.
- Easy communication between groups: Passing notes (shared global variables, heap memory) is extremely efficient.
- Suitable for I/O waiting scenarios: For example, if a group is waiting for express delivery (waiting for network requests, waiting for disk reading and writing), the workshop director can immediately switch to another idle group to work, and the CPU will not be idle.
shortcoming
- Poor stability: A team accidentally broke the main power supply of the workshop (triggering illegal memory access), and the entire workshop (the entire process) would be powered off. This is also the reason why many early web servers did not dare to use pure multi-threading.
- Need to deal with the "warehouse grabbing" problem: multiple teams using the same screwdriver (modifying the same global variable) at the same time will mess up the data. This is called race condition. The solution is to add a "tool usage register" (lock) to standardize, but using too many locks will reduce efficiency and even cause "deadlock" (two groups each take the screwdriver that the other party needs).
- Special limitations of Python: There is a GIL (Global Interpreter Lock) in CPython (our most commonly used Python interpreter): only one group (thread) can work in the "main operating area" (executing Python bytecode) of the workshop at the same time. This means that multi-threading written in pure Python cannot truly utilize multi-core CPUs to process computing tasks in parallel.
3. Modern Server Compromises: Hybrid Mode
Pure multi-processing or pure multi-threading both have their own shortcomings, so mainstream server software has long used a hybrid model of "multi-process guarantee + multi-thread efficiency":
- Apache: supports three types of multi-processing modules (MPM)
prefork: Pure multi-process, suitable for scenarios with extremely high stability requirements.worker: Multi-process foundation, multiple threads are opened in each sub-process, balancing stability and resource utilization.event: Optimize I/O waiting based on worker, resulting in higher performance.
- IIS: Multi-threaded by default, but also supports process isolation mode.
- Nginx: Goes further and directly uses the asynchronous model of Single Master + Multiple Worker processes + Event-driven. The performance crushes the traditional models of Apache and IIS in many scenarios.
4. Don’t let “switching workers” drag down efficiency: the pitfalls of task switching
Whether it is switching processes, threads, or coroutines to be discussed later, context switching itself has a cost:
- Save the progress bar and tool status of the current task.
- Prepare the progress bar and tool status of the next task.
- Put the next task on the CPU core and start execution.
If too many tasks are opened, the system may spend more than 80% of the time switching and only 20% of the time actually doing work. The CPU core utilization seems to be very high, but the actual business processing speed is very slow, and even "system suspended animation" occurs (the mouse and keyboard cannot move, but the background process is still running).
5. Choose the right person to do the right job: match the task type and plan
Before choosing a multitasking solution, first figure out whether your tasks are "CPU busy" or "Idle and waiting".
Computationally intensive tasks (CPU busy)
- Features: Most of the time is spent computing on the CPU, almost no idle time. Typical examples: scientific computing, video encoding and decoding, cryptography hashing/encryption, machine learning training inference.
- suggestion:
- Number of tasks ≈ number of CPU cores (can be 1 to 2 more to handle sudden system scheduling).
- Give priority to languages such as C / C++ / Rust / Golang that have no global interpreter lock and are extremely efficient after compilation.
- If you must use Python, choose multi-process (which can bypass GIL), or use C extension (for example, the bottom layers of NumPy and PyTorch are written in C and are not restricted by GIL).
I/O intensive tasks (many idle waits)
- Features: Spend most of the time "idlely waiting": waiting for network requests to return, waiting for database reading and writing, and waiting for user input. CPU core utilization is very low (maybe only 10% to 30%). Typical examples: Web services, crawlers, database middleware, chat software backends.
- suggestion:
- The number of tasks can be appropriately increased (for example, 5 to 10 times the number of CPU cores, or even more).
- These interpreted scripting languages such as Python/JavaScript are completely sufficient, and development efficiency is much more important than execution efficiency.
- The modern trend is to use asynchronous I/O + coroutine (more on this later), which is more efficient and less expensive than multi-threading.
6. The new favorite in high-concurrency scenarios: asynchronous I/O + coroutines
Asynchronous I/O
Modern operating systems (Linux's epoll, Windows' IOCP, macOS's kqueue) provide efficient asynchronous I/O support: there is no need to open multiple threads/processes to wait for I/O, you only need to tell the operating system "Call me when this network request comes back", and then the main thread/single process can do other things. Typical representatives: Nginx, Node.js, Redis (the main process is single-threaded, but the performance is extremely high).
Coroutine
Coroutines can be understood as "lightweight to the extreme worker group": all coroutines are in the same thread**, and the scheduler (event loop) written by the program itself controls the switching, without the intervention of the operating system. How cheap is it to switch a coroutine? It's probably one thousandth or even one ten thousandth of switching threads - just saving and restoring a few pointers.
The coroutine syntax in Python isasync / await, the supporting standard library isasyncio, as well as third-party librariesgevent(More friendly to synchronous code).
Advantages
- Full concurrency: Tens of thousands or even hundreds of thousands** of coroutines can be easily opened in a single thread (multiple threads running up to a few thousand will crash due to insufficient stack space).
- Almost no switching costs.
- No need to deal with complex thread lock issues (because only one thread is executing the coroutine, and only one coroutine is running the code of the "main operation area" at the same time).
7. One table to complete the selection
8. Python implementation example (minimalist version)
###Multiple processes
The running result takes about 3 seconds in total (three sub-processes are executed in parallel).
Multithreading
Because it is I/O intensive, the total time consumption is also about 3s (GIL istime.sleepwill be released).
Coroutine
The total time consumption is also about 3s, but the resource utilization is much higher than that of multi-threading.
9. Final summary
- First divide the task types: CPU-intensive multi-process/C extension, I/O-intensive coroutine/asynchronous framework.
- Don’t over-design: If it is just a simple background scheduled task, the thread pool is enough.
- Python Special Case: Never forget the existence of the GIL.
- Focus on modern trends: Asynchronous I/O and coroutines have become mainstream solutions for high-concurrency web services.
Choose the one that best suits your needs, rather than the one that looks "the most powerful". This is good architectural design.

