Python Generator Tutorial

What is a generator

Generator is a special lazy loading iterator in Python. It does not calculate all the values ​​at once and stuff them into the memory like a list. Instead, it waits until you actually want to use the next value before "temporarily starting" to calculate it. This on-demand computing feature allows the generator to directly reduce memory usage from "disaster level" to "friendly level" when processing extremely large data sets, infinite sequences, and other scenarios.

The core advantages of generators can be condensed into three points:

  • Extreme memory saving: Only save the current execution context, do not store historical values, and do not need to precalculate future values.
  • STATUS STAYS STRONG: Execute toyieldIt will automatically pause. Next time it resumes from the breakpoint, the values ​​of local variables will still be there.
  • Super concise code: compared to writing a complete iterator class by hand (to implement__iter__and__next__), the generator can reduce the amount of code by more than half

Two ways to create a generator

1. Generator expression: the lightest "lazy calculation" recipe

The syntax of generator expressions is very similar to list comprehensions. The only difference is that the square brackets[]Replace with parentheses()

# 列表推导式 —— 一次性生成10个平方数,全部占内存
list_comp = [x**2 for x in range(10)]
print(type(list_comp))  # <class 'list'>
print(list_comp)        # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

# 生成器表达式 —— 只保存了计算逻辑,什么都没真正算出
gen_exp = (x**2 for x in range(10))
print(type(gen_exp))    # <class 'generator'>

💡 Tips: Generator expressions are suitable for data conversion and filtering scenarios that do not require reuse and have simple logic. If the logic exceeds one line, or requires multiple pauses/resumes, it is recommended to use generator functions.

2. Generator function:yieldThe keyword is soul

Once an ordinary function is executedreturnIt will exit completely and all local variables will be destroyed. But as long as it appears in the function bodyyieldkeyword, the Python interpreter will recognize it as a generator function - when it is called, the function body will not be executed immediately, but a generator object will be returned.

def count_up_to(max_num):
    """一个简单的计数生成器"""
    count = 1
    while count <= max_num:
        yield count      # 暂停,返回count,等待下一次唤醒
        count += 1

##Basic usage of generators

PythonforThe loop will implicitly callnext(), and automatically process it after the data is exhaustedStopIterationException, the most worry-free to use:

for num in count_up_to(5):
    print(num)   # 依次输出 1 2 3 4 5

Manual callnext()(suitable for fine control)

If you need to control the value rhythm yourself (such as debugging or coordinating other logic), you can use the built-in functionnext()Get one by one:

gen = count_up_to(3)
print(next(gen))   # 1
print(next(gen))   # 2
print(next(gen))   # 3
# 如果继续调用,会抛出 StopIteration 异常
# next(gen)

Advanced features of generators

1. Clear state retention process

Add some print statements to the generator function to visually see the switching process between execution and pause:

def stateful_gen():
    print("🔵 生成器第一次启动")
    yield 1
    print("🟡 从第一个 yield 恢复")
    yield 2
    print("🔴 从第二个 yield 恢复,后面没有值了")

gen = stateful_gen()
# 注意:只有第一次调用 next() 才会真正进入函数!
print(next(gen))   # 输出 🔵 和 1
print(next(gen))   # 输出 🟡 和 2
# 再调用一次会输出 🔴,然后抛出 StopIteration

⚠️ Emphasis: The generator** must be called firstnext()** (or viaforThe loop implicit call) can start execution and will not run automatically when it is created.

2. New in Python 3.3+: Generator with return value

Ordinary generators will just throw when they reach the endStopIteration, no return value. Starting in Python 3.3, you can usereturngives a "final result", this value will be appended to the exceptionvalueAttributes for the upper layer to capture:

def gen_with_final():
    yield "第一步结果"
    yield "第二步结果"
    return "🎉 所有步骤完成!"

gen = gen_with_final()
try:
    while True:
        print(next(gen))
except StopIteration as e:
    print(e.value)   # 输出 🎉

3. Python 3.3+ New:yield fromdelegate subgenerator

When you want to iterate over another iterable object in a generator and output it one by one, you don't have to write it manuallyfor item in it: yield item, use directlyyield from itYou can complete the "delegation" - it will automatically hand over all the elements of the internal iterator one by one, and it will also be responsible for the dirty work such as exception delivery:

def chain_generators(*iterables):
    for it in iterables:
        yield from it   # 等价于 for item in it: yield item,但更优雅

gen = chain_generators([1, 2], (3, 4), "ab")
print(list(gen))   # [1, 2, 3, 4, 'a', 'b']

Three high frequency practical applications

1. Infinite Fibonacci Sequence

Storing infinite sequences with lists is an impossible task, but generators can easily represent "infinitely long" logic:

def fibonacci():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

# 只取前10个看看
fib = fibonacci()
for _ in range(10):
    print(next(fib))

2. Read very large files line by line

Faced with log files of hundreds of MB or even several GB, if you usefile.readlines()Reading everything into memory at once may directly cause memory overflow. Instead use a generator to read line by line, loading one line at a time, which is much more memory friendly:

def read_large_file(file_path):
    """逐行返回清洗后的行内容,不撑爆内存"""
    with open(file_path, 'r', encoding='utf-8') as file:
        for line in file:
            yield line.strip()

# 边读边处理,安全又高效
# for line in read_large_file('huge_server_log.txt'):
#     if 'ERROR' in line:
#         print(line)

3. Data processing pipeline (chain combination)

Generators can be combined withfilter()map()These functional tools work seamlessly together to build a lazy evaluation data processing pipeline - the previous steps are not actually executed until the last step (e.g.list()orforWhen the loop) starts consuming data, the entire pipeline runs in sequence:

def get_even_squares(max_num):
    # 第一步:生成所有候选数(生成器表达式,未计算)
    all_nums = (x for x in range(max_num))
    # 第二步:过滤出偶数(也是懒操作)
    even_nums = filter(lambda x: x % 2 == 0, all_nums)
    # 第三步:映射到平方值(依然懒得算)
    even_squares = map(lambda x: x**2, even_nums)
    return even_squares

print(list(get_even_squares(20)))
# 输出 [0, 4, 16, 36, 64, 100, 144, 196, 256, 324]

Quick addition: one-shot features of generators

Generators can only be iterated once. Once all the data is generated, it is "exhausted" and you can only create a new one if you want to use it again. If you really need to access the data repeatedly, you can convert the generator into a list in advance, but this will lose the memory saving advantage:

gen = (x**2 for x in range(3))
print(list(gen))   # [0, 1, 4]
print(list(gen))   # [] —— 已经没数据了!

Small exercise: Use the generator to output the Yang Hui triangle

Try to implement a generator that can output every row of Yang Hui's triangle infinitely. The reference answer is as follows:

def triangles():
    """无限生成杨辉三角的每一行"""
    row = [1]
    while True:
        yield row
        # 下一行:两边是1,中间是相邻两数之和
        row = [1] + [row[i] + row[i+1] for i in range(len(row)-1)] + [1]

# 打印前10行
results = []
for n, t in enumerate(triangles()):
    results.append(t)
    if n == 9:
        break

for row in results:
    print(row)

Summarize

Generators are a very "Pythonic" tool in Python. It is recommended to give priority to use in the following scenarios:

  1. Process very large data sets or large files (avoiding memory explosion)
  2. Represents infinite sequences (such as Fibonacci, infinite loop data flow)
  3. Build a lazy evaluation data processing pipeline (improve performance and readability)
  4. Learn the basic concepts of coroutines (understandyieldfunction, for subsequentasync/awaitbase)

Although generators may be slightly slower than list comprehensions in terms of pure calculation speed, their huge advantages in memory usage usually far outweigh that slight speed penalty - especially when you need to process massive amounts of data, generators are your best friend.