Modern Asynchronous IO Programming Guide

Overview of synchronous IO and asynchronous IO

In computer systems, there is a core performance gap: the processing speed of the CPU is at the nanosecond level, while IO operations such as disk read and write and network requests are at the millisecond/second level - the difference between the two is several orders of magnitude! If you continue to use the traditional synchronous IO model, the entire program/thread will be like a person who "finishes ordering coffee and waits for the bar counter to call the number, not even daring to touch the phone during the period", and the CPU computing power is completely wasted.

Synchronous IO problem

# 同步IO典型场景:卡死式文件处理
def process_file():
    # 1. CPU飞速跑预处理(几毫秒)
    do_some_preprocessing()
    
    # 2. 线程直接「挂起摆烂」等文件读完(可能几秒甚至更久!)
    with open('/path/to/user/data.txt', 'r') as f:
        data = f.read()
    
    # 3. 终于等到数据了,继续用CPU处理
    process_data(data)

Its shortcomings are obvious:

  1. The thread is completely idle during IO, and the CPU core computing power is flat.
  2. If you want to improve concurrency, you need to use multi-threading/multi-process, but there is a lot of overhead in thread creation and switching. If there are too many threads, it will drag down the performance due to "thread scheduling fights".
  3. Multi-threading can also easily introduce troublesome concurrency issues such as lock competition and deadlock.

Asynchronous IO model: Use "number retrieval mechanism" to solve waste

The core idea of ​​asynchronous IO completely corresponds to the call-taking logic of a coffee shop: Single thread + event loop. An event loop manages all "number-taking queue" IO tasks. When the task progresses (such as calling the number), the CPU is allocated to it for subsequent processing.

Core basic concepts

Use a few tags to clarify:

  • 💡 Event Loop: The "general scheduler" of asynchronous IO, checks the "task queue" in an infinite loop, and calls whoever completes the IO to come back to work.
  • 💡 Callback: Old-fashioned asynchronous "reminder note", a function that is automatically executed after IO is completed
  • 💡 Future/Promise: The "pickup coupon" of the asynchronous task. The result cannot be obtained temporarily, but you can arrange things after the result comes out.
  • 💡 Coroutine: The "core executor" of modern asynchronous, a lightweight function that can be paused and resumed at any time, about 1000 times smaller than a thread

The evolution and implementation of modern asynchronous IO

1. Callback mode (already abandoned for mainstream development)

The earliest asynchronous implementation used "reminder notes" to string together logic, but if it is deeply nested, it will become Callback Hell and cannot be maintained at all:

# 伪代码示例:三层回调的地狱场景
def async_user_workflow():
    # 第一层:读用户ID文件
    start_async_read('/user/id.txt', 
        lambda user_id: 
            # 第二层:用ID查数据库
            start_async_db_query(user_id,
                lambda user_info:
                    # 第三层:发激活邮件
                    start_async_send_email(user_info['email'],
                        lambda status: print(f"邮件发送{status}!")
                    )
            )
    )

2. Promise/Future mode (transition plan)

Change the nested callback to chained call, which makes the readability a little better, but still not natural enough:

// Node.js旧版Promise链式写法
const fs = require('fs').promises;
const db = require('./async-db');
const mail = require('./async-mail');

fs.readFile('/user/id.txt', 'utf8')
  .then(user_id => db.query(user_id))
  .then(user_info => mail.send(user_info['email']))
  .then(status => console.log(`邮件发送${status}!`))
  .catch(err => console.error(err));

3. Coroutine mode (modern preferred)

useasync/awaitThe keyword is written as asynchronous, and the code reads exactly the same as synchronization, but the bottom layer is completely asynchronously scheduled, which completely solves the callback hell and chain redundancy!

# Python asyncio协程写法(自然!清晰!)
import asyncio
import aiofiles  # 异步文件库,别用内置open

async def process_file():
    do_some_preprocessing()
    
    # await关键字:暂停当前协程,把控制权交还给事件循环,去做别的事
    async with aiofiles.open('/path/to/user/data.txt', 'r') as f:
        data = await f.read()
    
    process_data(data)

# 启动事件循环运行协程
asyncio.run(process_file())

Modern asynchronous IO framework for mainstream languages

Python asyncio

Python 3.4+ built-in official asynchronous framework, withaiohttp(network),aiofiles(document),aiomysql(Database) These ecosystems can cover almost all IO scenarios:

import asyncio
import aiohttp

# 异步抓取网页的协程
async def fetch(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            return await response.text()

async def main():
    # 批量创建异步任务(一次性取多个咖啡号!)
    tasks = [
        fetch('https://example.com'),
        fetch('https://example.org'),
        fetch('https://python.org')
    ]
    # 等待所有任务完成并收集结果
    results = await asyncio.gather(*tasks)
    print([len(res) for res in results])  # 输出每个网页的长度

asyncio.run(main())

Node.js event loop

Node.js is inherently an asynchronous language based on event loops.async/awaitIt is also a core feature of ES8+:

const fs = require('fs').promises;

async function processUserFile() {
    try {
        const data = await fs.readFile('/path/to/user/data.txt', 'utf8');
        console.log("文件读取成功:", data.slice(0, 50));
    } catch (err) {
        console.error("出错啦:", err);
    }
}

processUserFile();

Rust tokio

Rust's most mainstream asynchronous runtime has extremely strong performance and extremely low memory usage, making it suitable for writing high-performance services:

use tokio::fs;

#[tokio::main]  // 自动启动tokio事件循环
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let data = fs::read_to_string("/path/to/user/data.txt").await?;
    println!("文件内容前50字:{}", &data[..50.min(data.len())]);
    Ok(())
}

Asynchronous IO pitfalls and best practices

1. Never block the event loop!

The event loop is single-threaded. Once a task occupies the CPU (such as image decoding, complex sorting), all other IO tasks will be stuck - just like the chief scheduler temporarily moved bricks, and the call was ignored at all.

Solution: Throw CPU-intensive tasks to thread pool/process pool:

import asyncio
import concurrent.futures

# 假设有个CPU密集型的图片压缩函数
def compress_big_image(img_path):
    # 假设这里有10秒的纯CPU计算
    ...

async def async_compress(img_path):
    loop = asyncio.get_running_loop()
    # 用默认线程池跑压缩,await不阻塞事件循环
    with concurrent.futures.ThreadPoolExecutor() as pool:
        await loop.run_in_executor(pool, compress_big_image, img_path)

2. Reasonably limit the number of concurrencies

Although asynchronous IO can open thousands of coroutines, if the number of concurrency exceeds the upper limit of the IO device (for example, 10,000 requests are sent to the database at the same time), it will time out or report an error. Can be controlled with Semaphore:

import asyncio
import aiohttp

# 最多同时跑10个抓取任务
semaphore = asyncio.Semaphore(10)

async def limited_fetch(url):
    async with semaphore:  # 拿到信号量才能跑
        async with aiohttp.ClientSession() as session:
            async with session.get(url) as response:
                return await response.text()

3. Handle timeouts and errors well

Asynchronous tasks are prone to timeout or other abnormalities due to network fluctuations. Be sure to addtry/exceptandwait_for

import asyncio
import aiohttp

async def safe_fetch(url):
    try:
        # 超时时间设为3秒
        return await asyncio.wait_for(
            aiohttp.ClientSession().get(url).text(),
            timeout=3.0
        )
    except asyncio.TimeoutError:
        print(f"请求{url}超时!")
        return None
    except aiohttp.ClientError as e:
        print(f"请求{url}出错:{e}")
        return None

When to use asynchronous IO? When not to use it?

✅ Applicable scenarios (mainly IO intensive)

  • High-concurrency network services: Web server, API gateway, real-time chat/push system
  • Batch IO operations: download 1,000 files at the same time, check 500 database records at the same time
  • Calls between microservices: Avoid services blocking each other due to waiting

❌ Unsuitable scenes

  • Purely CPU-intensive tasks: image/video decoding, machine learning inference, complex mathematical calculations (directly using multi-threads/multi-processes is more efficient)
  • Simple single-threaded script: There is no concurrency requirement, and asynchrony will increase code complexity.

Summarize

Modern asynchronous IO programming perfectly solves the resource waste problem of synchronous IO through single-threaded event loop + lightweight coroutine. Compared with the traditional multi-threaded model, it occupies less memory, has less context switching overhead, and has no lock competition risk. It has become an essential skill for back-end development!

As long as you remember these three points: "Don't block the chief scheduler", "Control concurrency properly" and "Handle timeout errors", you can quickly start writing efficient asynchronous code!