Python file reading, writing and data persistence practice

In the back-end development or model training of Daoman Python AI, we often need to deal with the persistence of data. The so-called persistence refers to transferring data from volatile storage media (usually memory) to media that can be stored for a long time (usually hard disk, SSD). The most direct and standard way to achieve this goal is to save data to files through the File System.


1. Understand the abstraction layer of the file system

The computer's file system is a mature "data steward" solution. It uses the simple logical concepts of "file" and "tree directory" to completely shield the complex underlying physical details of hard disk track addressing, SSD wear balancing, and USB protocol interaction.

As an application layer developer, you don't need to find blank physical sectors of the hard disk in advance, nor do you need to worry about data verification after a power outage - the file system will automatically allocate space, manage indexes, and handle fault tolerance. You only need to use absolute path/relative path + file name to easily complete the "save and retrieve" operations, which greatly reduces the entry barrier for storage development.


2. Open and close files:open()Function core analysis

The official recommended entry point for operating files in Python is the built-in functionopen(). By passing in different operation mode string, you can control the file's read and write permissions, processing method (text/binary), update permissions, etc.

2.1 Quick check on common operating modes

The following table summarizes the mode combinations in high-frequency scenarios to avoid confusing the meanings of single characters:

Operation mode combinationCore permissionsKey detailsHigh-frequency scenarios
'r'/'rt'Read-only textDefault mode; if the file does not exist, an error will be reported directlyRead configuration, training data preview
'w'/'wt'Overwrite textIf the file does not exist, create it; If it exists, clear all contents and rewriteTemporarily generate a new configuration/log backup
'a'/'at'Append writing textIf the file does not exist, create it; If it exists, append it at the end, and the cursor will be at the end by defaultCrawler data storage, training real-time log
'x'/'xt'Exclusively write textIf the file does not exist, create it; If it exists, report an error directlyPrevent multiple people/multiple processes from accidentally overwriting key files
'rb'Read-only binaryRead the original byte stream directly; no encoding parameters are requiredRead images, model weights (.pth/.pt), and compressed packages
'wb'Overwrite binaryDirectly write the original byte streamSave pictures, models, and binary logs
'r+'/'w+'Update textSupports both reading and writing;w+Old files will be cleared firstSimple configuration that needs to be read and modified at the same time

🚩 Beginner’s guide to avoiding pitfalls (must read) When processing Chinese text, JSON/YAML and other structured text files, must be specified explicitlyencoding='utf-8'
Do not rely on the default encoding of the operating system (such as GBK under Windows, Latin-1 that may be used in Mac/Linux in the early days), otherwise there is a probability of more than 90% that you will encounter classicUnicodeDecodeErrororUnicodeEncodeError


3. Industrial-level practice of text reading and writing

Although Python providesfile.close()Manually close the file, but 100% recommended in production environment or team collaboration codewithKeyword (context manager)**.

It has two irreplaceable advantages:

  1. Automatically safe release of resources: even if the read and write process occursValueErrorIOErrorEven for system-level exceptions, the file handle will be released immediately without causing resource leakage (resource leakage in high-concurrency services will cause "Too many open files" errors and directly hang the service).
  2. Code is more readable: The indented block clarifies the "valid scope of file operations", so team members can understand at a glance where the file is being operated.

3.1 Three mainstream ways to read text files

According to the file size and business needs, choose the most suitable reading solution. Do not blindly read in full:

# 所有示例都统一使用 with 上下文管理器 + utf-8 编码
with open('致橡树.txt', 'r', encoding='utf-8') as file:
    # ========== 方式 A:逐行迭代(⭐️⭐️⭐️ 工业级首选) ==========
    # 每次只加载一行到内存,内存占用永远是 O(1)
    # 适合处理 GB 级的日志、千万条的爬虫数据
    print("=== 逐行迭代(strip去除换行/空格) ===")
    for line in file:
        # 记得用 strip() 或 rstrip('\n') 去掉自动带的换行符
        print(line.strip())

    # ========== 方式 B:全量读取(适合 <10MB 的文件) ==========
    # 一次性加载所有内容到内存,得到一个完整的字符串
    # 注意:上面的 for 循环已经把文件指针移到末尾了,需要重置
    file.seek(0)  # 把文件指针移回文件开头
    print("\n=== 全量读取 ===")
    full_content = file.read()
    print(full_content[:100])  # 只打印前100个字符演示

    # ========== 方式 C:列表读取(适合需要索引某一行的场景) ==========
    # 把每行作为一个元素存入列表,内存占用和全量读取差不多
    file.seek(0)
    print("\n=== 列表读取(打印第3行) ===")
    lines_list = file.readlines()
    if len(lines_list) >= 3:
        print(lines_list[2].strip())

Comparative summary of the three methods:

  • Line-by-line iteration (mode A): extremely low memory usage, suitable for very large files, and capable of handling production-level log streams.
  • Full read (method B): simple and direct, suitable for one-time processing of small files, such as loading JSON configuration.
  • List reading (mode C): It is convenient to access a certain row by index, but it will take up more memory, so be careful when using large files.

3.2 Writing and appending data

Depending on whether to keep old content, choose'w'(override) or'a'(Append) mode:

# ========== 场景 1:覆盖写(新建或清空) ==========
with open('致橡树-精简版.txt', 'w', encoding='utf-8') as file:
    # write() 不会自动加换行符,需要手动加 \n
    file.write("我如果爱你——\n")
    file.write("绝不像攀援的凌霄花,\n")
    file.write("借你的高枝炫耀自己;")

# ========== 场景 2:追加写(保留旧内容,末尾加新的) ==========
with open('致橡树.txt', 'a', encoding='utf-8') as file:
    # 追加两行赏析内容
    file.write("\n\n--- 诗歌赏析 ---")
    file.write("\n《致橡树》通过木棉与橡树的对话,展现了独立、平等、相互扶持的现代爱情观。")

Writing Tips:

  • overwrite'w'The original data will be cleared with one click. Please confirm the backup before operation.
  • append write'a'The cursor is automatically positioned at the end of the file, no need to manuallyseek
  • If you need to wrap the line at the end and add new content, remember to add it manually before the new content\n

Through this article, you should have mastered the core skills and industrial best practices of Python file reading and writing. Whether dealing with configurations, logs, training data, and model weights, make good use ofopen()withAnd appropriate read and write modes can make your data persistence code both efficient and robust.