Python multi-process programming guide
1. Process basics
1.1 Why use multiple processes?
Have you ever encountered this situation: running a Python script to process a large amount of data, computer CPU only uses less than 10%? Or is the single thread stuck in the I/O gap and slow? At this time, you can try multi-process.
Process is the basic unit of resource allocation and scheduling in the operating system. Each process has its own independent memory space, file descriptor, etc., and does not interfere with each other - it is not as easy to get into trouble as multi-threaded shared global variables. Multiprocessing can truly take advantage of the parallel capabilities of multi-core CPUs.
In order to circumvent the limitations of the interpreter's GIL (global interpreter lock, which will be explained later) on CPU-intensive tasks, Python specifically provides complete multi-process support. Common implementation methods include:
os.fork()—— Unix/Linux/macOS only, low-level but not cross-platformmultiprocessingModule - Python's official cross-platform solution with fine controlconcurrent.futures.ProcessPoolExecutor——Python 3.2+ advanced encapsulation, more concise codesubprocessModule - mainly used to call external command processes
2. Unix-like low-level creation only:os.fork()
Unix/Linux/macOS providesfork()System call, it will make an almost complete copy of the current parent process (child process), the only difference is the return value:
- Returns the PID (process ID) of the child process in the parent process
- Returns 0 in child process
A minimalist example:
⚠️ Note a few hard restrictions:
- Not supported at all by Windows
fork(), an error will be reported when running - Copying memory space is very expensive (although there is a copy-on-write mechanism, it is not as good as
spawnsafe and general) - The child process will inherit all resource status of the parent process, but subsequent modifications** will not be synchronized back to the parent process**
3. Official cross-platform first choice:multiprocessingmodule
In order to unify multi-process implementation on different platforms, Python providesmultiprocessingmodule, fully compatible with all mainstream systems, and the API style is close to multi-threading.
3.1 Single sub-process control:Processkind
like usingthreading.ThreadTo create a thread, useProcessThe class specifies the function to be run by the child process,args/kwargsPassing on parameters,start()start up,join()The wait is over.
3.2 Batch management:Poolprocess pool
If you need to run dozens or hundreds of small tasks at the same time, don't create Process one by one - the overhead of process creation and destruction is not small. Process pools can be used to reuse created processes, which greatly improves efficiency.
It is recommended to use Context Managerwith Pool(...)(Python 3.3+), automatically release resources:
📌 Two core methods of Pool:
apply_async(func, args): Non-blocking asynchronous submission, suitable for batch tasksapply(func, args): Blocking synchronous submission, wait until the current task is completed before submitting the next one, almost useless
4. External command process call:subprocessmodule
If the task is not to write Python functions, but to call shell scripts, system commands, and other language programs, usesubprocessmodule. It completely replaces the oldos.system、os.popen, safer and more controllable.
4.1 Simple call (get return value and output)
It is recommended to use Python 3.5+subprocess.run(), supports capturing output, setting timeouts, etc.:
4.2 Interacting with external processes
If you need to send input to an external process and read real-time output, you can usePopenClass (with context manager):
5. Inter-process communication (IPC): Queue and Pipe
Because the process has an independent memory space and cannot directly share global variables like multi-threads, a special IPC mechanism must be used to transfer data.multiprocessingTwo commonly used methods are built-in:
5.1 Multi-producer-multi-consumer:Queue
Queue is the most commonly used IPC method, thread safe, process safe, allowing multiple processes to write and multiple processes to read.
5.2 One-to-one communication:Pipe
Pipe is a bidirectional (default) or unidirectional pipe, especially suitable for quickly transferring data between two processes.
6. Modern Python high-level packaging:concurrent.futures.ProcessPoolExecutor
Introduced in Python 3.2concurrent.futuresmodule, unifies the multi-process and multi-thread API, the code is more concise, and it also supportsmapAdvanced functions such as batch submission, exception capture, and timeout control.
It is also recommended to use the context manager to automatically release resources:
7. Best Practices and Pitfall Guidelines
7.1 Avoid pitfalls 1: Must addif __name__ == '__main__'
Windows usespawnWhen starting a subprocess in this way, the current module will be re-imported as the entry point of the subprocess. If this line is not added, the child process will execute all top-level code again, resulting in unlimited process creation and error reporting. It is safest to add this sentence to all multi-process code.
7.2 Avoid Pitfall 2: Impact of GIL
The GIL (Global Interpreter Lock) of the Python interpreter ensures that only one thread executes Python bytecode at the same time. Therefore, multi-threading cannot use multi-core CPUs to do true parallel computing, but it can be competent for I/O-intensive tasks (because the GIL will be released during I/O operations).
- Suitable for multi-process: CPU-intensive tasks (data compression, image processing, mathematical calculations, etc.)
- Suitable for multi-threaded/asynchronous I/O: I/O intensive tasks (network requests, file reading and writing, etc.)
7.3 Avoid Pitfall 3: Resource Sharing
Try not to share state directly, and use Queue/Pipe to deliver messages to avoid deadlocks and data competition. If small amounts of data must be shared, usemultiprocessing.ValueorArray, and must be locked and protected:
7.4 Avoid pitfall 4: Child process exceptions will not be propagated automatically
If an exception is thrown inside the child process, the parent process will not see it by default and will onlyjoin()After seeing non-zeroexitcode。
useProcessPoolExecutoroffuture.result()orPoolofget()Method, you can catch the exception of the child process and expose the error explicitly.
8. Quick summary
Recommended choice for modern Python (≥3.5):
- Batch Task →
ProcessPoolExecutor - Fine control of individual child processes →
Process - Call external commands →
subprocess.run()/Popen
Proper use of multiple processes can fully unleash the potential of multi-core CPUs and make your programs fly! 🚀

