Apache Kafka practical tutorial
#Apache Kafka practical tutorial
What is Apache Kafka?
In scenarios with millions of events per second such as e-commerce Double 11 and live broadcast barrages, Apache Kafka is absolutely indispensable traffic reservoir and event pipeline. It is an open source distributed event streaming platform. Its core advantages are high throughput (millions per second), low latency, persistent storage, horizontal scalability, and support for multiple languages.
Common application scenarios include: log aggregation, event sourcing, stream processing, data synchronization between systems, and asynchronous message queues.
1. Kafka quick installation (Docker Compose single node)
Docker Compose is the first choice for beginners to start Zookeeper (the metadata management that the old version of Kafka relies on), single-node Kafka, and visual monitoring Kafka Manager with one click:
Start the service:
verify:
- Browser opens
http://localhost:9000Enter Kafka Manager and add a cluster (fill in the ZK addresszookeeper:2181) - Subsequent command line operations can be done by installing the Kafka tool on the host machine or executing it in the container (
docker exec -it kafka /bin/bash)
2. Core concepts (1 minute to understand)
3. Topic management and command line testing
3.1 Topic management (common commands)
3.2 Command line production and consumption test
4. Python and Kafka integration (the most commonly usedkafka-python)
4.1 Install dependencies
4.2 Encapsulated producers and consumers (out of the box)
5. Practical best practices (simplified but core)
5.1 Partition strategy
- **How to determine the number of partitions? **General advice:
分区数 ≈ 目标吞吐量 / 单分区吞吐量(A single partition can usually reach 100,000-1 million/second, depending on the message size and hardware) - **How to control message partitioning? ** Specify
key(The same key is sent to the same partition, ensuring order), or not specified (polling)
5.2 Reliability Tuning
5.3 Performance Tuning
- Producer: On
linger_ms(collect batches),batch_size(batch size, default 16KB),compression_type(Compressed with snappy/lz4 to reduce network transmission) - Consumer: On
max_poll_records(Batch pull, default 500),fetch_min_bytes(Wait at least how many bytes before returning, default 1B)
Summarize
Apache Kafka is the preferred tool for building real-time event systems. Get started quickly through Docker Compose, master core concepts, Python integration, and best practices, and you can handle most business scenarios. If a production-level cluster is required, it is recommended to expand to 3 nodes (to ensure replica fault tolerance) and configure monitoring alarms.

