IO编程

1. IO 基础概念

IO（Input/Output）即输入输出，指计算机与外部设备（如磁盘、网络、终端等）之间的数据传输。由于CPU和内存的速度远高于外设，IO操作成为程序性能的关键瓶颈。

1.1 IO 的基本类型

输入(Input)：数据从外部流向内存（如读取文件、接收网络数据）
输出(Output)：数据从内存流向外部（如写入文件、发送网络数据）

1.2 流(Stream)的概念

流是IO操作中的核心抽象，可以想象为数据流动的管道：

输入流：数据从外部源流向程序
输出流：数据从程序流向外部目标

2. IO 处理模式

2.1 同步IO (阻塞IO)

# 同步IO示例 - 文件读取
with open('example.txt', 'r') as file:
    content = file.read()  # 程序会阻塞在这里直到读取完成
    print(content)

特点：

程序必须等待IO操作完成才能继续执行
编程模型简单直接
可能导致程序性能下降

2.2 异步IO (非阻塞IO)

# 异步IO示例 - 使用asyncio
import asyncio

async def read_file():
    with open('example.txt', 'r') as file:
        content = await file.read()  # 不会阻塞事件循环
        print(content)

asyncio.run(read_file())

特点：

程序发起IO操作后可以继续执行其他任务
IO完成后通过回调或事件通知程序
编程模型复杂但性能更高

3. Python 中的IO操作

3.1 文件IO

基本文件操作

# 写入文件
with open('output.txt', 'w') as f:
    f.write('Hello, World!')

# 读取文件
with open('output.txt', 'r') as f:
    content = f.read()
    print(content)

文件模式

模式	描述
'r'	只读（默认）
'w'	写入，会覆盖已有文件
'x'	排他性创建，文件已存在则失败
'a'	追加，写入到文件末尾
'b'	二进制模式
't'	文本模式（默认）
'+'	更新（可读可写）

3.2 内存IO

from io import StringIO, BytesIO

# 字符串IO
string_io = StringIO()
string_io.write('Hello')
string_io.write(' World!')
print(string_io.getvalue())  # 输出: Hello World!

# 字节IO
bytes_io = BytesIO()
bytes_io.write(b'binary data')
print(bytes_io.getvalue())  # 输出: b'binary data'

3.3 网络IO

import urllib.request

# 同步HTTP请求
with urllib.request.urlopen('https://www.python.org') as response:
    html = response.read().decode('utf-8')
    print(html[:200])  # 打印前200个字符

4. 高级IO技术

4.1 上下文管理器

Python的with语句自动管理资源：

with open('file.txt', 'r') as f:
    data = f.read()
# 文件会自动关闭，即使发生异常

4.2 缓冲IO

Python默认使用缓冲IO提高性能：

# 无缓冲
with open('file.txt', 'w', buffering=0) as f:
    f.write('no buffer')

# 行缓冲
with open('file.txt', 'w', buffering=1) as f:
    f.write('line buffer\n')

# 指定缓冲区大小
with open('file.txt', 'w', buffering=4096) as f:
    f.write('4KB buffer')

4.3 内存映射文件

处理大文件的高效方式：

import mmap

with open('large_file.bin', 'r+b') as f:
    # 内存映射
    mm = mmap.mmap(f.fileno(), 0)
    # 像操作内存一样操作文件
    print(mm[10:20])
    mm.close()

5. 现代IO编程实践

5.1 使用pathlib替代os.path

from pathlib import Path

# 更面向对象的文件路径操作
path = Path('example.txt')
content = path.read_text()
path.write_text('New content')

5.2 异步IO (asyncio)

import asyncio

async def fetch_data():
    reader, writer = await asyncio.open_connection('example.com', 80)
    writer.write(b'GET / HTTP/1.1\r\nHost: example.com\r\n\r\n')
    await writer.drain()
    data = await reader.read(100)
    print(data.decode())
    writer.close()
    await writer.wait_closed()

asyncio.run(fetch_data())

5.3 并发文件处理

from concurrent.futures import ThreadPoolExecutor
import glob

def process_file(filename):
    with open(filename) as f:
        return len(f.read())

# 使用线程池并行处理多个文件
with ThreadPoolExecutor() as executor:
    files = glob.glob('*.txt')
    results = executor.map(process_file, files)
    print(list(results))

6. 性能优化建议

批量操作：减少IO次数，尽量一次读写更多数据
使用缓冲：合理设置缓冲区大小
异步处理：对于网络IO和磁盘IO密集应用考虑异步
内存映射：处理大文件时考虑内存映射
减少小文件：合并小文件减少IO次数

7. 常见问题与解决方案

Q: 如何处理大文件而不耗尽内存？ A: 使用逐行读取或分块读取：

# 逐行读取
with open('large_file.txt') as f:
    for line in f:
        process(line)

# 分块读取
CHUNK_SIZE = 4096
with open('large_file.bin', 'rb') as f:
    while chunk := f.read(CHUNK_SIZE):
        process(chunk)

Q: 如何确保文件写入完成？ A: 使用flush()和os.fsync()：

import os

with open('important.dat', 'w') as f:
    f.write('critical data')
    f.flush()  # 确保数据写入操作系统缓冲区
    os.fsync(f.fileno())  # 确保数据写入物理磁盘

Q: 如何高效处理大量小文件？ A: 使用线程池或异步IO：

from concurrent.futures import ThreadPoolExecutor

def process_file(filename):
    with open(filename) as f:
        return f.read()

with ThreadPoolExecutor(max_workers=8) as executor:
    results = executor.map(process_file, many_files)

通过本教程，你应该对Python中的IO编程有了全面的了解。根据应用场景选择合适的IO策略，可以显著提高程序性能。

#IO编程

#1. IO 基础概念

#1.1 IO 的基本类型

#1.2 流(Stream)的概念

#2. IO 处理模式

#2.1 同步IO (阻塞IO)

#2.2 异步IO (非阻塞IO)

#3. Python 中的IO操作

#3.1 文件IO

#基本文件操作

#文件模式

#3.2 内存IO

#3.3 网络IO

#4. 高级IO技术

#4.1 上下文管理器

#4.2 缓冲IO

#4.3 内存映射文件

#5. 现代IO编程实践

#5.1 使用pathlib替代os.path

#5.2 异步IO (asyncio)

#5.3 并发文件处理

#6. 性能优化建议

#7. 常见问题与解决方案