Python Serialization and Deserialization Guide
1. Serialization overview
When the program is running, all variables—whether they are simple dictionaries, lists, or complex custom class instances—are temporarily stored in the memory stack. Once the program ends, the operating system will immediately reclaim this memory, and the data will disappear.
But in actual development, we often need to "retain data":
- Save the game progress as an archive and play again next time;
- Cache the data captured by the crawler locally to avoid repeated requests;
- Transfer structured information through API between front-end and back-end and microservices...
At this time, it is necessary to convert those "living, three-dimensional objects" in the memory into a format that can be stored (such as writing to files, databases) or transmitted (such as sending through the network). This process is called serialization. (In Python's binary-specific tools, this process is also vividly called pickling - "pickling" the data and saving it.)
In turn, restoring these flattened data into objects that can be directly manipulated in memory is deserialization (unpickling, unpickling).
Next, let’s take a look at the two most commonly used serialization schemes in Python.
2. Python pickle module
pickleIt is the simplest and most direct binary serialization tool that comes with Python.
It handles almost all of Python's built-in types, even custom class instances with methods.
2.1 Basic usage
There are only four core APIs, which are very easy to remember:
dumps(obj): Serialize the object tobytesA binary string of type;dump(obj, file): Directly serialize and write file objects;loads(bytes):Deserialize back to object from binary string;load(file):Deserialize from file object back to object.
2.2 Limitations of pickle
pickleAlthough convenient, the applicable scenarios are very narrow and have three main flaws:
-
Absolutely exclusive to Python The binary data generated by pickle is completely incomprehensible to other languages (Java, Go, JavaScript, etc.), so it can only be used for data exchange within the Python environment.
-
Poor version compatibility Pickle files generated between different Python major versions (such as 2.x and 3.x) or even minor versions (such as 3.8 and 3.12) are likely to be incompatible, and old archives may not be read properly after upgrading the interpreter.
-
High-risk security vulnerabilities **Never deserialize pickle data from untrusted sources! ** The restore process of pickle is essentially executing a piece of Python bytecode. A maliciously constructed pickle file can directly run any system command to delete your important files, steal privacy, and even control your computer. :::
3. JSON serialization
JSON (JavaScript Object Notation) is currently the most common cross-language text serialization format. Not only are Python, JavaScript, Java, Go and other mainstream languages supported natively, but JSON itself is plain text and is very clear for humans to read.
3.1 Data type correspondence table
JSON is a "lightweight" format that supports only six basic types.
Python's standard libraryjsonType mapping is performed automatically during serialization and deserialization:
3.2 Basic usage
jsonModule API design andpickleVery similar, there are still four core methods, but the object processed is UTF-8 text or string in text form:
3.3 Handle Chinese characters well (a practical tip)
::: tip Chinese display optimization
By default,json.dumpswill escape non-ASCII characters (such as Chinese) into\uXXXXform.
For the program, the front and back ends can parse it normally, but it is very unfriendly for human eyes to read.
Just addensure_ascii=False, you can retain the native Chinese characters.
At the same time, be sure to remember to specify it explicitly when reading and writing files.encoding="utf-8", to avoid garbled characters.
4. Serialize custom objects
jsonBy default, the module cannot directly handle instances of custom classes, and we need to provide our own "object → dictionary" conversion logic.
There are two commonly used methods.
4.1 Simple method: use directly__dict__
If your class is just a pure data container with no private attributes and no complex inheritance relationships, you can directly use the Python object that comes with it.__dict__Properties - It automatically packs all the public properties of the instance into a dictionary.
4.2 A more flexible method: achieving exclusiveto_dictandfrom_dict
When your class has private attributes, attributes inherited from the parent class, or you want to mark the class information during serialization to facilitate global deserialization, it is more recommended to specifically implement the conversion method inside the class.
5. Security and Best Practices
No matter which serialization scheme you choose, keep the following points in mind:
-
pickle never touch untrusted data This cannot be emphasized enough. Only use pickle in scripts, local caches, or internal pipelines that you have complete control over.
-
Perform strict structure verification on JSON input The deserialized data is likely to have missing fields or unexpected types. You need to check manually, or use
pydanticWait for the verification library to ensure the reliability of the data structure. -
Always do exception-handling Whether it's a file that doesn't exist, insufficient permissions, or a malformed JSON, it can happen at any time. Example:
6. Performance optimization (big data scenario)
When the amount of data processed reaches the GB level, or JSON is used frequently in high-concurrency network requests, the standard libraryjsonMay become a performance bottleneck. At this point you may wish to consider the following alternatives.
6.1 Faster JSON library
-
orjson
Currently the fastest JSON library in the Python ecosystem, installation method:pip install orjson。
What it returns isbytesinstead ofstr, and can be automatically serializeddatetime、UUIDand other common types. -
ujson
It is also much faster than the standard library and has slightly better compatibility thanorjson, installation method:pip install ujson。
6.2 Binary cross-language format
If you have higher requirements for parsing speed and data compression rate, you can abandon plain text JSON and use binary format instead:
-
MessagePack
Similar structure to JSON, but smaller and faster. Installation method:pip install msgpack。 -
Protocol Buffers(protobuf)
The structured binary serialization format produced by Google has the strongest performance and highest compression rate, but it needs to be written in advance..protofile to define the data structure, the cost of getting started is slightly higher.
7. Summary
When choosing a serialization solution, make trade-offs based on your core needs:
Finally, I would like to emphasize again: **Safety first, pickle only believes in yourself! **

