Practical combat of reading and writing CSV files in Python
Practical practice of reading and writing CSV files in Python: "Universal Table Protocol" in the era of plain text
1. CSV file: lightweight cross-platform data transfer station
CSV (Comma Separated Values), the full name is "Comma Separated Values", is a plain text data format that has no official mandatory standard, but has wide consensus in the industry. In Daoman Python AI's actual projects, CSV is almost everywhere:
- Temporary import/export middleware for databases
- "Data Express" between front-end and back-end and cross-language systems
- Fast storage and exchange of small and medium-sized machine learning data sets
Its biggest advantage is plain text: whether you use Windows Notepad, macOS Numbers, Linux Vim or Excel, you can directly open and edit it without installing special parsing tools.
The CSV structure convention that everyone follows by default:
- Plain text, coding-friendly: UTF-8 is commonly used (preferred for cross-platform), and sometimes GBK is also encountered (such as files exported by some domestic Excels)
- Each line is a complete record: For example, one line is "Guan Yu's three courses results"
- Clearly separated fields: English commas are used by default
,, but tabs can also be used\t, vertical line|etc. as delimiters (especially useful when the data itself contains a large number of commas) - Table header is optional, but highly recommended: The first line states the meaning of each column, so that both humans and machines can understand it at a glance
- Use brackets for special content: If a field contains delimiters, newlines, or quotation marks themselves, use double quotation marks.
"Wrap the entire field to avoid confusion in parsing
2. Use nativecsvModule writing to files: from simple data to custom formats
Python standard library comes withcsvModule, without pip to install any third-party package, can cover about 80% of daily CSV reading and writing needs. Let’s start with the most basic scenario: save the random scores of Guan Yu, Zhang Fei and other five Shu Han generals.scores.csv。
Basic scenario: ordinary data + UTF-8 encoding
✨ After running, try opening it in Excel? Remember to select the "UTF-8 comma separated" format when importing, and Chinese will be displayed normally.
Advanced scenario: Custom separator + bracket
In actual work, you may encounter some "non-mainstream" CSV files:
- Use vertical lines
|delimited (because the data naturally contains a lot of commas) - Force double quotes for all fields (a safer, unambiguous common practice)
At this time, just givecsv.writerJust pass in additional parameters:
Looking at the generated file, you will find that each column uses|separated, and all fields are wrapped in double quotes, which is clear and safe.
3. Use nativecsvModule reading files: from line-by-line traversal to key-value pair access
When reading CSV, there are two most commonly used methods: list-based reading (simple and direct) and dictionary-based reading (strongly recommended, automatically associates headers with data).
Method 1: List-based reading (csv.reader)
Method 2: Dictionary reading (csv.DictReader, highly recommended! )
If you don't want to rely on indexingrow[0]Take a name, rely onrow[1]Get Chinese language scores,DictReaderwould be a better choice - it automatically uses the table header as the key of the dictionary, and converts each row of data into a dictionary:
📌 Tip: If you are reading the vertical bar-delimited file defined previously, remember to create
readerorDictReaderThe same is passed in whendelimiter='|', otherwise the analysis will fail!
4. Practical pit avoidance guide (a must-read for novices!)
csvThe module seems simple, but in actual use there are still several high-frequency pitfalls. Understanding it in advance can save a lot of debugging time.
Pit 1: Chinese garbled characters
- Cause: Inconsistent encoding. For example, writing in GBK but reading in UTF-8, or vice versa.
- SOLVED: When writing and reading, always explicitly specify the same
encoding. Domestic Excel uses GBK by default. In other scenarios, it is recommended to use UTF-8 first.
Pit 2: Extra blank lines on Windows platform
- Cause: Windows default newline character is
\r\n,andcsvThe module will automatically handle line breaks, no needopenThe function's own end-of-line conversion. - SOLVED: When writing to a file,
open()must be addednewline=''(It is usually added when reading to maintain consistency).
Pit 3: The iterator can only be traversed once
- reason:
csv.readerandcsv.DictReaderWhat is returned is an iterator and repeated traversal is not supported. - Solution: If you need to use data multiple times, you can use
list(reader)Convert everything into a list. But be aware that large files (more than hundreds of thousands of lines) are best processed line by line to avoid memory explosion.
5. Summary and Prospect: NativecsvIt's the basics, Pandas is the advanced weapon
NativecsvModule applicable scenarios
- Fast reading and writing of small and medium-sized CSV files (within tens of thousands of lines)
- Scenarios that require highly customized formats such as separators and brackets
- Lightweight projects that do not want to introduce third-party dependencies
The power of advanced artifact Pandas
In the large-scale data analysis and machine learning scenarios involved in Daoman Python AI, we almost no longer write nativecsv, instead use Pandas directly:
- Read and write CSV with one line of code:
pd.read_csv()anddf.to_csv(), concise and clear - Automatic conversion to DataFrame: A two-dimensional table structure similar to Excel that supports advanced operations such as filtering, sorting, grouping aggregation, and missing value processing. One line of API can replace dozens of lines of native code.
- Significant performance advantages: For files with hundreds of thousands or even millions of lines, Pandas reads faster and has better memory management.
In the next article, we will officially enter the world of Pandas reading and writing CSV and experience "nuclear weapon" level data processing efficiency. Stay tuned!

