💡 Do you often need to save temporary configurations, crawler results or API response cache during development? It’s too heavy to use a database, and it’s troublesome to parse it manually with ordinary text? Today’s protagonist—Python standard libraryjson, add some third-party gadgets to easily handle lightweight JSON text storage and interaction~

JSON processing tutorial in Python

JSON (JavaScript Object Notation) is a lightweight data exchange format that is easy to read and parse for both humans and machines. It is now a "universal language" used in almost all scenarios. This tutorial takes you from basics to advanced to quickly master JSON processing in Python.


📌 JSON Basics

JSON data structure

JSON only supports 6 core data structures, which is very concise:

  • Object 🗂️: curly braces{}Wrapped key-value pair, the key must be a double-quoted string
  • Array 📊: square brackets[]Wrapped ordered collection of values, mixed types possible (but specification recommended)
  • Value: can be a string (double quotes), number, Boolean value (true/false)、null, object or array

🔄 Correspondence between JSON and Python types

PythonjsonThe module will automatically do bidirectional mapping, just remember this table clearly:

JSON typePython type
objectdict
arraylist
stringstr
numberint/float
trueTrue
falseFalse
nullNone

📖 Read JSON data

Loading from string (json.loads

loads= load string, suitable for processing strings returned by API, multi-line configuration text, etc.

import json

# 多行 JSON 字符串可以用三引号,合法 JSON 不强制缩进,但 Python 写好看点
json_str = '''
{
    "name": "John",
    "age": 30,
    "city": "New York",
    "hobbies": ["reading", "traveling"],
    "married": false
}
'''

# 转换为 Python 对象
data = json.loads(json_str)

print(type(data))          # <class 'dict'>
print(data["name"])        # John
print(data["hobbies"][0])  # reading

Load from file (json.load

load= load file object, remember to usewithStatement ** automatically closes the file.

import json

# 假设同目录下有 data.json 文件
try:
    with open('data.json', 'r', encoding='utf-8') as f:
        data = json.load(f)  # 直接传文件对象
except FileNotFoundError:
    print("文件不存在,创建默认数据...")
    data = {"default": True}

print(data)

✍️ Write JSON data

Convert to JSON string (json.dumps

dumps= dump string, three must-use practical parameters should be remembered:

  1. ensure_ascii=False: Reserve Chinese and other non-ASCII characters (if not added, it will become\uXXXXGarbled characters)
  2. indent=2or4:Set the indentation level to make it easier for people to read.
  3. sort_keys=True: Sort by dictionary key, the output result is more stable (easy to compare)
import json

data = {
    "name": "张三",
    "age": 28,
    "city": "北京",
    "hobbies": ["摄影", "编程"],
    "married": True
}

# 转换为带缩进、有序、保留中文的 JSON 字符串
json_str = json.dumps(data, ensure_ascii=False, indent=4, sort_keys=True)
print(json_str)

Write to JSON file (json.dump

dump= dump file object, parameters anddumpsconsistent.

import json

data = {
    "name": "李四",
    "age": 35,
    "city": "上海",
    "hobbies": ["游泳", "音乐"],
    "married": False
}

with open('output.json', 'w', encoding='utf-8') as f:
    json.dump(data, f, ensure_ascii=False, indent=4, sort_keys=True)

🚀 Advanced usage

Processing custom classes/datetime objects

standardjsonModules cannot directly serialize custom classes (such asUser)ordatetime, two methods can be used:

Method 1: CustomizedefaultFunction (simple and flexible)

existdumps/dumpwhen passed indefault, tells the module how to handle unknown objects:

import json
from datetime import datetime

class User:
    def __init__(self, name, age):
        self.name = name
        self.age = age
        self.created_at = datetime.now()

def custom_encoder(obj):
    # 先处理特殊类型
    if isinstance(obj, datetime):
        # 转 ISO 格式字符串,跨平台且可读
        return obj.isoformat()
    if isinstance(obj, User):
        # 显式指定要序列化的字段(比直接用 __dict__ 安全,不会带临时属性)
        return {
            "name": obj.name,
            "age": obj.age,
            "created_at": obj.created_at
        }
    # 其他类型交给默认处理(会抛出 TypeError)
    raise TypeError(f"Object of type {obj.__class__.__name__} not serializable")

user = User("王五", 40)
json_str = json.dumps(user, default=custom_encoder, ensure_ascii=False, indent=2)
print(json_str)

Method 2: Inheritancejson.JSONEncoder(Suitable for packaging)

If the same serialization logic is used in multiple places, it can be encapsulated into a subclass and passed inclsparameter:

class CustomJSONEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, datetime):
            return obj.isoformat()
        if isinstance(obj, User):
            return {
                "name": obj.name,
                "age": obj.age,
                "created_at": obj.created_at
            }
        return super().default(obj)

# 调用时传 cls
json_str = json.dumps(user, cls=CustomJSONEncoder, ensure_ascii=False, indent=2)

Automatically restore special types when parsing (object_hook

When reading JSON, each time a JSON object (that is, Python'sdict), will be calledobject_hookfunction, here we can restore the time in ISO format todatetime, or put ordinarydictRestore toUser

import json
from datetime import datetime

def custom_decoder(dct):
    # 每解析到一个 dict 就检查
    if "created_at" in dct:
        try:
            dct["created_at"] = datetime.fromisoformat(dct["created_at"])
        except ValueError:
            pass  # 如果格式不对就保留字符串
    # 如果有 name 和 age,还可以还原成 User(可选)
    if "name" in dct and "age" in dct and "created_at" in dct:
        return User(dct["name"], dct["age"])
    return dct

json_str = '''
{
    "name": "赵六",
    "age": 45,
    "created_at": "2023-01-15T10:30:00"
}
'''

data = json.loads(json_str, object_hook=custom_decoder)
print(type(data))  # <class '__main__.User'>(如果上面做了还原)
print(type(data["created_at"]) if isinstance(data, dict) else type(data.created_at))  # <class 'datetime.datetime'>

Streaming large JSON files (usingijson

If the JSON file exceeds the memory limit (such as several G of log or crawler data), you can use a third-party libraryijsonRead chunk by chunk instead of loading the entire file at once:

# 先安装 ijson
pip install ijson
import ijson

# 假设 large_users.json 是一个包含很多用户的数组:[{"name":...}, {"name":...}, ...]
with open('large_users.json', 'rb') as f:
    # 只流式读取并处理所有用户名,不用加载整个数组
    for name in ijson.items(f, 'item.name'):
        print(name)
        # 这里可以做插入数据库、统计等操作

✅ Best Practices

  1. Encoding issues must be paid attention to: read and write filesencoding='utf-8', serialization plusensure_ascii=False
  2. Required for file operationswith: Automatically manage file handles to prevent leaks
  3. Error handling is required: at least captureFileNotFoundError(File does not exist),json.JSONDecodeError(Parse error, Python 3.5+),TypeError(serialization error)
  4. Streaming must be used for large data:ijsonpandas.read_json(chunksize=...)All are good choices
  5. Security must be considered: Do not parse JSON from untrusted sources to prevent malicious code injection (although Pythonjsonmodule ratioevalSafe, but still cautious)

🎉 Summary

Python's standard libraryjsonIt can already cover 90% of daily JSON processing needs, withijsonGadgets such as lightweight text storage, API interaction, and temporary data caching can be easily handled~ Next time you encounter such needs, don’t rush to open a database!