Quick introduction and practical combat of Python distributed process (with complete code)
When dealing with CPU-intensive cross-machine tasks such as batch data processing and model training slicing, Python's GIL (global interpreter lock) will completely "flatten" multi-threads, and standard local multi-processes cannot utilize the computing power of multiple machines - at this time, the standard librarymultiprocessing.managersModules can quickly help us build a lightweight distributed system.
This article will start from process/thread selection logic, step by step implement distributed computing of Master‑Worker architecture, and give production-usable improvement suggestions.
1. First clarify: when to use distributed processes?
Many students are new to concurrency/parallelism and are confused about which one to choose between process-thread-coroutine and distributed? First, let’s clarify the core limitations in the Python scenario:
To put it simply: if your calculations cannot be handled by one machine, or you need to use the idle cores of different machines, distributed multi-processing is the first choice.
2. Lightweight distributed core architecture
multiprocessing.managersThe design is very simple, using the classic Master‑Worker mode:
Disassembly of core components
- Master Node
- Create two core queues:
Task Queue(put pending tasks),Result Queue(Receive completed results) - pass
BaseManagerExpose the queue to LAN/public network** - Responsible for task distribution and results summary
- Worker node
- pass
BaseManagerNetwork queue connected to Master - Loop from
Task QueueGet the task, process it and stuff it backResult Queue - You can start/stop any number of Workers at any time without modifying the Master code
⚡ The advantage of this design is: Workers can be added as they want and stopped as they want, and the Master does not need to be changed at all, which is very suitable for dynamic expansion and contraction scenarios.
3. Complete code actual combat
We use "calculate the square of a bunch of random numbers" as a simulation task (each sleep is calculated for 1 second, the simulation takes real time) to demonstrate the entire deployment process.
1. Master node code (task_master.py)
2. Worker node code (task_worker.py)
4. Rapid deployment and testing
Stand-alone simulation (recommended to verify on this machine first)
-
Open terminal 1 and start Master:
-
Open terminal 2, 3... (simulate multiple Workers), run the Worker (default connection
127.0.0.1): -
Observe the terminal output: the task will be "robbed" for processing by two Workers, and the results will be returned to the Master in the order of processing.
Cross-machine deployment
- put
task_worker.pyCopy to other machines. - When running the Worker on another machine, pass in the Master’s intranet IP:
- ⚠️ Remember to turn off the firewall on the Master machine or enable
56789port.
5. Suggestions for improving the production environment
The above code is only the minimum usable demo. It is not enough to be used directly in production. At least the following optimizations are required:
1. Security
- Don't hardcode keys: use environment variables or config file reading instead
authkey - TLS encrypted communication:
multiprocessing.managersNative support for TLS, need to generate SSL certificate - IP whitelist: Use iptables or restrict in code that only specified IPs can connect to the Master
2. Fault tolerance mechanism
- Task Retry: The Master maintains the task status, and tasks that have not received results will be automatically retransmitted after timeout.
- Heartbeat Detection: The Worker sends heartbeats to the Master regularly. If no heartbeat is received within the timeout, the Worker will be marked offline and tasks will be reassigned.
- Dead letter queue: Tasks that have failed to process more than 3 times will be placed in a special queue for manual inspection.
3. Performance optimization
- Batch tasks: put/take a batch of tasks at one time to reduce network round-trip overhead
- Efficient serialization: pickle serialization is used by default, with average performance, and can be replaced by MessagePack / Protocol Buffers
- Task sharding: If the task is too heavy, split it into small tasks on the Master side in advance and then distribute them
4. Modern alternatives
If you don’t want to reinvent the wheel, mature tools are recommended for production environments:
- Celery + Redis/RabbitMQ: The most commonly used distributed task queue, supporting task scheduling, retry, and priority
- Dask: Specialized in processing large-scale data calculations, the interface is similar to Pandas/NumPy, and the learning cost is low
- Ray: A high-performance distributed computing framework that supports complex scenarios such as machine learning and reinforcement learning.
Summarize
multiprocessing.managersIt is a lightweight distributed solution built into Python, suitable for quickly verifying cross-machine computing requirements, and can be run without installing any third-party libraries. However, if it is a large-scale production environment, it is still recommended to use mature tools such as Celery and Dask.
I hope this article can help you get started with Python distributed processes quickly! If you have any questions, please feel free to discuss them in the comment area.

