File reading and writing

#File reading and writing As one of the most common IO scenarios in daily Python development, file reading and writing may seem simple, but the details can determine the robustness, efficiency and even security of the code. Today we will use a short article to explain it clearly, from basic APIs to pitfall avoidance guides!

Basics of file operations

Python uses the built-inopen()The function performs core file operations and returns a file object/file descriptor - this is like a "bridge" between the Python code and the operating system disk file.

For security and stability reasons, modern operating systems prohibit programs from directly reading and writing disk sectors, and all file operations must be packaged through the unified API provided by the system. This is also why we can't simply read a file using a memory address: we have to let the operating system do the actual I/O work for us.


Read file

Basic opening and exception-handling

useopen()When reading a file, the default mode is'r'(text reading). But if the path you give does not exist, Python will throw out without mercyFileNotFoundErrorInterrupt the program:

# 基础读取打开
f = open('/path/to/your/file.txt', 'r')

In order to make the program more robust, be sure to add the most basic exception catching:

try:
    f = open('/path/to/nonexistent.txt', 'r')
except FileNotFoundError as e:
    print(f"文件不存在,请检查路径:{e}")

Four common ways to read content

After successfully opening the file, select the corresponding method based on the file size and reading requirements:

  1. Read all at once – suitable for small files (usually within tens of KB)

    content = f.read()  # 读取整个文件到单个字符串
  2. Read segments by bytes/characters – Friendly for large files, avoiding one-time loading that fills up memory

    chunk = f.read(1024)  # 文本模式读1024个字符,二进制模式读1024字节
  3. Line-by-line single reading – commonly used when logical processing needs to be performed on each line

    line = f.readline()  # 读取一行,包含末尾的换行符 \n
  4. Read all into the list – very convenient when you need to operate the row content multiple times later

    lines = f.readlines()  # 每一行(含 \n)作为列表的一个元素

Two methods to safely release resources

After the file is opened, it will occupy the file descriptor resources of the operating system (the number available for each process is limited, such as the default number of Linux is usually 1024), and it must be closed in time when used up!

  1. Manual close – easy to forget, and will be skipped when reading or writing throws an exceptionclose()

    f = open('/path/to/file.txt', 'r')
    content = f.read()
    f.close()  # 只有放在 try/except/finally 里才真正安全
  2. withAutomated management (highly recommended) – Leverage context managers to automatically trigger at the end of a code blockclose(), it can be closed correctly even if an exception occurs midway

    with open('/path/to/file.txt', 'r') as f:
        content = f.read()
    # 此处文件已经自动关闭,f 变量不再可用

File object type

Python supports three core types of "file-like" objects. Understanding their usage scenarios can save you a lot of detours:

  1. Disk text file The default opening method, processing by character/string, will automatically decode and encode.

    with open('note.txt', 'rt') as f:  # 't' 可省略
        pass
  2. Disk Binaries Process non-text content such as images, videos, compressed packages, etc. by bytes/byte strings.

    with open('cat.jpg', 'rb') as f:
        img_bytes = f.read()
  3. Memory file Simulates file operations entirely in memory, requiring no real disk I/O, making it extremely fast and ideal for working with temporary data.

    from io import StringIO, BytesIO
    
    # 文本内存流
    text_stream = StringIO("临时的博客草稿")
    text_stream.write("\n再加一行修改")
    print(text_stream.getvalue())  # 不用 with 也能直接取所有内容
    text_stream.close()
    
    # 二进制内存流
    binary_stream = BytesIO(b'\x48\x65\x6c\x6c\x6f')  # 对应 "Hello"

Process text files with different encodings

Python 3 uses UTF-8 encoding to read and write text by default, but on Windows, old files of GBK/GB2312 are often encountered, and opening them directly will throw an error message.UnicodeDecodeError

# 指定编码读取 GBK 文件
with open('old_chinese_note.txt', 'r', encoding='gbk') as f:
    content = f.read()

If there are a small number of illegal characters (such as mixed garbled characters) in the file, you can useerrorsParameter adjustment processing strategy:

  • errors='ignore'– Ignore illegal characters
  • errors='replace'- usereplace illegal characters
  • errors='strict'– Default value, report an error directly (safest, reminding you that there is a problem with the file)
# 遇到乱码就替换
with open('mixed_encoding.txt', 'r', encoding='utf-8', errors='replace') as f:
    content = f.read()

Write to file

Two basic writing modes

The most critical thing about writing operations is to prevent accidental overwriting of original data. Let’s first look at the two most commonly used modes:

  1. Overwrite ('w'
    If the file does not exist, it will be automatically created; if the file already exists, it will be opened and clear all contents and then start writing from scratch.

    with open('new_diary.txt', 'w', encoding='utf-8') as f:
        f.write("今天学了 Python 文件读写!")
  2. Additional writing ('a'
    If the file does not exist, it will be automatically created; if it already exists, it will be written from the end of the file and the original content will be retained.

    with open('new_diary.txt', 'a', encoding='utf-8') as f:
        f.write("\n明天继续学 CSV/JSON 读写!")

Two ways to write multiple lines

You can choose to call in a loopf.write(), you can also use the more concisef.writelines()

lines = ["第一行", "第二行", "第三行"]

# 方式1:手动加换行符循环写
with open('multi_line.txt', 'w', encoding='utf-8') as f:
    for line in lines:
        f.write(f"{line}\n")

# 方式2:用生成器表达式批量传值(更 Pythonic)
with open('multi_line.txt', 'w', encoding='utf-8') as f:
    f.writelines(f"{line}\n" for line in lines)

⚠️ Note:f.writelines()Line breaks will not be added automatically! You have to manually splice it yourself\n


File Mode Cheat Sheet

Basic modeCore functionsCombination suffixSupplementary instructions
'r'Read-only (default)'t'Text mode (default, can be omitted)
'w'Write only (clear overwrite, create automatically)'b'Binary mode
'x'Create exclusively (an error will be reported if the file exists)'+'Can read and write at the same time (basic mode function is retained)
'a'Write only (append at the end, automatically created)

Examples of common combinations:

  • 'rb'– Read binary files such as images and compressed packages
  • 'w+'– Read and write (overwriting type, you need to manually move the file pointer after reading to re-read)
  • 'a+'– Reading and writing (append type, the pointer must also be moved after reading)

Avoid pitfalls and best practices

  1. Always added by defaultencoding='utf-8'
    Avoid encoding issues when cross-platform (Windows ↔ Linux / macOS).
  2. Use foreverwithStatement Forgetting to close the file manually is the most common mistake novices make. UsewithOnce and for all.
  3. Large files must be iterated line by line/section by section For example, to process GB-level log files, usefor line in f:Line by line reading ratiof.readlines()100 times more memory friendly:
    with open('10GB_server.log', 'r', encoding='utf-8') as f:
        for line in f:  # 逐行迭代,内存占用极小
            if "ERROR" in line:
                print(line.strip())
  4. use'x'Schema avoids overwriting critical data For example, when writing a user configuration file, prevent the old configuration from being accidentally deleted:
    try:
        with open('user_config.json', 'x', encoding='utf-8') as f:
            f.write('{"theme": "dark"}')
    except FileExistsError:
        print("配置文件已存在,无需创建")

Quick exercise: Reading system time zone file

Use the knowledge you just learned to write a small script to read commonly used time zone files in Linux / macOS:

import os

TIMEZONE_PATH = '/etc/timezone'

if __name__ == '__main__':
    if not os.path.exists(TIMEZONE_PATH):
        print("当前系统没有找到 /etc/timezone 文件(可能是 Windows)")
        exit(1)
    
    try:
        with open(TIMEZONE_PATH, 'r', encoding='utf-8') as f:
            print(f"当前系统时区:{f.read().strip()}")  # strip() 去掉末尾换行
    except PermissionError:
        print("没有权限读取 /etc/timezone 文件,请用 sudo 运行")

Summarize

Today we have sorted out the core knowledge points of Python file reading and writing:

  1. Useopen()CooperatewithContext manager safely opens and closes files
  2. Distinguish the usage scenarios of three file objects: text, binary and memory.
  3. Handle encoding issues (encodinganderrorsparameter)
  4. Three basic writing modes: overwrite, append, and exclusive creation
  5. Memory-friendly tips for large file processing

Master these, and 90% of the file I/O needs in daily development can be easily solved! In the next article, let’s talk about more advanced CSV/JSON/Excel structured file reading and writing~