StringIO and BytesIO: read and write memory as a file
Have you ever encountered such a scenario: you want to quickly test a piece of code that reads and writes files, but you don’t want to leave a bunch of temporary files in the project? Or when processing API messages or data pipelines, do you need to temporarily generate or parse a string or binary stream that does not require landing at all? Python standard libraryio.StringIOandio.BytesIOThis is exactly why they are created - they are perfect "class file" stand-ins in memory. These two classes implement almost the same interface as real files, but the data is stored directly in the memory buffer, eliminating the overhead of disk I/O and allowing your code to seamlessly switch between memory streams and disk files.
In short: If you want to use the file interface, but don't want to touch the disk, use them. **
First figure it out: when to use which one?
The most fundamental difference between the two classes is the operated data type. If you choose the wrong one, it will be thrown directly.TypeError:
StringIO——Plain text stream, only accepts PythonstrString (Unicode), read and write alsostr。BytesIO——Binary stream, only acceptedbytesorbytearray, what is read and written is also byte class.
Remember this rule, and subsequent usage will correspond one to one.
Getting started with StringIO: In-memory text buffer
Basic gameplay: create, write, read
The most direct way is to create an emptyStringIOobject, and then write text into it as if it were a file:
You can also directly pass in the initial content when creating, which is equivalent to writing the content first and then automatically moving the pointer back to the beginning:
When reading data, the behavior is exactly the same as opening a real file:
Other practical methods (corresponding to real files)
StringIOAlso providedtell()、truncate()Other methods, the behavior is consistent with the file object. The following example shows the feature of "the content will not be automatically truncated after overwriting" and how to manually truncate:
This feature is very useful when buffers need to be written and reused repeatedly.
Getting started with BytesIO: In-memory binary buffers
BytesIOThe usage ofStringIOAlmost exactly the same, the only difference is that all operations must be converted to bytes. Use when writingb'...'prefix orstr.encode()generatebytes, if you want to convert back to text after reading, you needdecode()。
Basic gameplay
Reading and high-performance memory views
getbuffer()What is returned is a read-only view of the underlying memory, which can significantly reduce the overhead caused by data copying when processing binary streams of hundreds of MB or even larger.
High-frequency scenarios in real projects
-
Unit test simulation file When testing a function that generates an Excel report, directly pass a
BytesIOObject goes in without actually creating a temporary file. -
Network API interaction use
requestsWhen uploading files to the library, you can directly uploadBytesIOObject simulates a file object. -
Temporary transfer of data After the crawler grabs a compressed package, there is no need to save it locally. It can be immediately
BytesIODecompression processing. -
Ad hoc generation or parsing of CSV/JSON/XML For example, a CSV string is generated and returned directly to the front end without touching the disk at all.
Performance and Best Practices
Performance Tips
- Small to medium size data (within tens of MB): Use directly
getvalue()、read()The simplest. - Large binary streams (hundreds of MB and above): priority
getbuffer()Obtain the memory view and perform slicing operations to avoid unnecessary data copying. - Built in Python 3
io.StringIO/io.BytesIOthan earliercStringIOWait for the third-party implementation to be faster and more stable, and stop using the old modules.
Best Practices
Automatically close using context manager
Although the memory buffer will be garbage collected even if it is not closed, it is a good idea to usewithThis habit can avoid potential memory leaks and make the code cleaner:
Reuse buffer
When you need to use the same buffer repeatedly in a loop, you can clear the content first and then reuse it to avoid frequent creation and destruction of large objects:
Summarize
io.StringIOandio.BytesIOThey are two small but extremely useful tools in the Python standard library:
- Perfectly simulates the complete interface of the file object
- Data resides in memory, fast speed
- Completely avoid unnecessary disk I/O
- Code can be seamlessly switched between memory streams and real files
In the future, if you encounter a scenario where "you don't want to read and write disks, but you need a file interface", don't hesitate and use them directly! **

