Detailed explanation of Python iterators and iterable objects

In Python development, "traversal" is one of the most common operations - whether it is scanning a list, viewing dictionary key-value pairs, or reading a file line by line, we are accustomed to writing afor ... in .... But have you ever been curious: **Why can these things be put in?forcycle? Why some objects can be used directlynext()Get elements? ** The answer lies in two core concepts: "Iterable" and "Iterator".

Iterable object (Iterable)

Simply put, an iterable object is one that “can beforThe object of loop traversal"**. It itself does not have the ability to directly produce the "next value", but it can "convert" a tool that can produce elements one by one - that is, an iterator.

Common iterable objects in Python can be divided into two categories:

  • Basic Collection Class:listtupledictsetstrbyteswait
  • Objects with delay characteristics: generator, file object,zip()/map()/filter()return value, etc.

How to determine whether it is iterable?

useisinstance()In conjunction with the standard librarycollections.abc.Iterable

from collections.abc import Iterable

# 基础集合类型
print(isinstance([1, 2], Iterable))        # True
print(isinstance({'a': 1}, Iterable))      # True
print(isinstance('hello', Iterable))       # True

# 延迟对象
print(isinstance((x for x in range(3)), Iterable))  # True
print(isinstance(open('test.txt', 'r'), Iterable))  # True(别忘了关闭文件!)

# 非可迭代类型
print(isinstance(100, Iterable))           # False
print(isinstance(True, Iterable))          # False

Iterator

Iterators are an "advanced version" of iterable objects - they can not only enterforCycle, ** can still benext()Actively called, taking out elements one by one until exhausted and thrownStopIterationabnormal**. It can be understood as "a tool that holds data flow control."

How to determine whether it is an iterator?

Also useisinstance(), this time withcollections.abc.Iterator

from collections.abc import Iterator

# 生成器是迭代器!
print(isinstance((x for x in range(3)), Iterator))  # True

# 通过 iter() 转换后得到的也是迭代器
print(isinstance(iter([]), Iterator))               # True

# 基础集合本身不是迭代器!
print(isinstance([1, 2], Iterator))                 # False
print(isinstance('hello', Iterator))                # False

Get iterator from iterable object

Use built-in functionsiter()You can easily convert:

nums = [1, 2, 3]
it_nums = iter(nums)  # 拿到迭代器

print(next(it_nums))  # 1
print(next(it_nums))  # 2
print(next(it_nums))  # 3
print(next(it_nums))  # 抛出 StopIteration 异常!

Note: Iterators are very "lazy" and only work when you callnext()It will calculate and return the next value only when it is running, which provides the possibility to handle large amounts of data or infinite sequences.

Iterators vs. Iterable objects: What’s the difference?

Sometimes it’s easy to get confused by just reading the text. Let’s use a comparison table and actual scenarios to explain the differences thoroughly:

Core FeaturesIterable Object (Iterable)Iterator (Iterator)
Can I put it in directly?forLoop
Can it be used?next()Active value
Whether lazy calculation is supportedMostly not supported (except special objects)✅ Absolutely supported
Can it represent an infinite sequence❌ (The memory cannot fit at all)
Memory usageFixed (all elements need to be pre-stored)Extremely low (only the current calculation logic is maintained)

Let’s take the most intuitive example of infinite natural number sequence:

  • Use lists? impossible--[0, 1, 2, …]Sooner or later the memory will be filled up.
  • Implemented using iterators? Easily (I will customize one for you later).

Let’s look at another example of memory usage comparison (under CPython, the memory overhead of integers varies due to factors such as small integer pools, but here we mainly look at the trend):

# 生成 100 万个平方数的列表 —— 需要提前分配大量内存
big_list = [x*x for x in range(1_000_000)]

# 生成 100 万个平方数的迭代器 —— 只占用几十字节!
big_iterator = (x*x for x in range(1_000_000))

You can even just create the iterator immediately withnext()Get values ​​on demand without waiting for them to be generated.

forThe “true face” of cycles

every line you writefor x in xxx, the bottom layer of Python will automatically complete these three things:

  1. Calliter(xxx)Get the corresponding iterator object
  2. Loop callnext(迭代器), and assign the result tox
  3. Once capturedStopIterationException, exit the loop gracefully

For example, this simple code:

for num in [1, 2, 3]:
    print(num)

Python is actually equivalent to executing the following logic internally:

# 第一步:获取迭代器
it = iter([1, 2, 3])

# 第二步 + 第三步:循环取值 + 异常退出
while True:
    try:
        num = next(it)
        print(num)
    except StopIteration:
        break

Do you feel nowforIs the cycle less "mysterious"?

Practical Development Suggestions

1. When dealing with big data/infinite sequences, give priority to iterators or generators

When you have to process files with hundreds of thousands or millions of lines, or want to traverse infinite sequences, please avoid list comprehensions and use generator expressions or custom iterators instead, otherwise the memory will soon be exhausted.

# ❌ 一次性把整个文件读进内存(危险)
lines = [line.strip() for line in open('huge_file.log')]

# ✅ 用生成器表达式按行处理(安全)
lines_safe = (line.strip() for line in open('huge_file.log'))
for line in lines_safe:
    # 逐行处理
    pass

2. Custom iterator, only two methods are needed

If you want to write an iterator class yourself (for example, to implement the infinite natural numbers just mentioned), you must implement two special methods:

  • __iter__():Must return the iterator itself (self), this is the key to keeping the protocol unified between iterators and iterable objects
  • __next__():Responsible for "removing the current element, moving to the next element, or throwing at the endStopIteration

Go directly to the code to implement an infinite natural number iterator starting from the specified starting point:

from collections.abc import Iterator

class InfiniteNaturals(Iterator):
    def __init__(self, start=0):
        self.current = start
    
    # 返回自身
    def __iter__(self):
        return self
    
    # 真正“取下一个值”的逻辑
    def __next__(self):
        # 这里没有终止条件——真的是无限序列!
        result = self.current
        self.current += 1
        return result

# 测试
nat = InfiniteNaturals(10)
print(next(nat))  # 10
print(next(nat))  # 11
# 注意:如果直接用 for 循环 nat,会进入无限循环哦!

If your sequence has an end, just add__next__()Just add the termination condition:

class RangeCustom(Iterator):
    def __init__(self, start, end):
        self.current = start
        self.end = end
    
    def __iter__(self):
        return self
    
    def __next__(self):
        if self.current >= self.end:
            raise StopIteration     # 到达终点,主动终止
        result = self.current
        self.current += 1
        return result

# 测试
for num in RangeCustom(2, 5):
    print(num)   # 依次输出 2, 3, 4

3. Iterators are "disposable"!

This feature is very easy to get into trouble: The iterator will be empty after traversing it once, and then use it againnext()orforNone of the loops get any value.

it = iter([1, 2, 3])
print(list(it))  # [1, 2, 3]
print(list(it))  # [] —— 空的!

Tips: If you need to traverse repeatedly, you should get a new iterator each time, or directly save the original iterable object (such as a list) and call it again when needediter()

Summarize

A thorough understanding of iterators and iterable objects is an important cornerstone of writing efficient, Pythonic, memory-friendly code. Let us consolidate it in one sentence:

  • Iterable is the "raw material/container" that "provides" iterators
  • Iterator is a "tool/worker" that holds the logic of lazy calculation and produces the next value on demand
  • forLoops are just syntactic sugar for the Iterator pattern
  • Custom iterator needs to be implemented__iter__()(returns itself) and__next__()

Next time you write traversal code, you might as well think a little more: Is it better to use a list here, or is it more appropriate to use a generator or iterator? Your programs may become more elegant and robust as a result.