Iterators and Generators¶
★★★★★ Intermediate
Iterators implement the iteration protocol (__iter__/__next__), while generators provide a concise way to create iterators using yield. Generator expressions offer memory-efficient lazy evaluation. The itertools module provides a toolkit of composable iterator building blocks.
Key Facts¶
- Iterable: has
__iter__(). Iterator: has__next__()(raisesStopIterationwhen done) - All iterators are iterables, but not all iterables are iterators (lists are iterable, not iterators)
- Generators are one-shot - exhausted after single iteration
- Generator expressions use
()instead of[]- constant memory regardless of size yieldpauses function, preserves local state;returnterminates functionyield fromdelegates to another iterable/generator
Patterns¶
Iterator Protocol¶
# How for-loop actually works:
my_list = [1, 2, 3]
iterator = iter(my_list) # calls __iter__()
while True:
try:
item = next(iterator) # calls __next__()
except StopIteration:
break
Custom Iterator¶
class CountDown:
def __init__(self, start):
self.current = start
def __iter__(self):
return self
def __next__(self):
if self.current <= 0:
raise StopIteration
self.current -= 1
return self.current + 1
Generator Functions¶
def count_up(n):
i = 1
while i <= n:
yield i # pauses here, returns i
i += 1 # resumes here on next()
gen = count_up(3)
next(gen) # 1
next(gen) # 2
next(gen) # 3
next(gen) # StopIteration
yield from¶
def flatten(nested):
for item in nested:
if isinstance(item, list):
yield from flatten(item)
else:
yield item
list(flatten([1, [2, [3, 4], 5], 6])) # [1, 2, 3, 4, 5, 6]
Generator Expressions¶
squares = (x**2 for x in range(1_000_000)) # lazy, ~constant memory
sum(x**2 for x in range(100)) # no extra parentheses in function call
send() - Two-Way Generators¶
def accumulator():
total = 0
while True:
value = yield total
if value is None:
break
total += value
gen = accumulator()
next(gen) # 0 (must initialize with next() first)
gen.send(10) # 10
gen.send(20) # 30
Pipeline Pattern (Unix pipes)¶
def read_lines(filename):
with open(filename) as f:
for line in f:
yield line.strip()
def filter_comments(lines):
for line in lines:
if not line.startswith('#'):
yield line
def parse_csv(lines):
for line in lines:
yield line.split(',')
# Chained - processes file line by line, minimal memory
pipeline = parse_csv(filter_comments(read_lines("data.csv")))
for row in pipeline:
process(row)
Infinite Sequences¶
def fibonacci():
a, b = 0, 1
while True:
yield a
a, b = b, a + b
from itertools import islice
list(islice(fibonacci(), 10)) # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
itertools Module¶
Infinite Iterators¶
from itertools import count, cycle, repeat
count(10, 2) # 10, 12, 14, 16, ...
cycle([1, 2, 3]) # 1, 2, 3, 1, 2, 3, ...
repeat('x', 3) # 'x', 'x', 'x'
Terminating Iterators¶
from itertools import chain, islice, zip_longest, takewhile, dropwhile
from itertools import filterfalse, accumulate, groupby, tee, pairwise
chain([1,2], [3,4]) # [1, 2, 3, 4]
islice(count(), 5) # [0, 1, 2, 3, 4]
zip_longest([1,2], [3,4,5], fillvalue=0) # [(1,3), (2,4), (0,5)]
takewhile(lambda x: x < 5, [1,3,5,2]) # [1, 3]
dropwhile(lambda x: x < 5, [1,3,5,2]) # [5, 2]
accumulate([1,2,3,4,5]) # [1, 3, 6, 10, 15]
pairwise([1,2,3,4]) # [(1,2), (2,3), (3,4)] (Python 3.10+)
Combinatoric Iterators¶
from itertools import product, permutations, combinations
product([1,2], ['a','b']) # [(1,'a'), (1,'b'), (2,'a'), (2,'b')]
permutations([1,2,3], 2) # [(1,2), (1,3), (2,1), (2,3), (3,1), (3,2)]
combinations([1,2,3,4], 2) # [(1,2), (1,3), (1,4), (2,3), (2,4), (3,4)]
groupby (requires sorted input)¶
from itertools import groupby
data = sorted(students, key=lambda s: s.grade)
for grade, group in groupby(data, key=lambda s: s.grade):
print(grade, list(group))
Generator .throw() and .close()¶
def resilient_reader(path):
with open(path) as f:
for line in f:
try:
yield line.strip()
except GeneratorExit:
print("Cleanup on close")
return
except Exception as e:
print(f"Handling injected: {e}")
gen = resilient_reader("data.txt")
next(gen)
gen.throw(ValueError, "bad data") # inject exception at yield point
gen.close() # raises GeneratorExit inside generator
Subgenerator Delegation with yield from¶
yield from is more than syntactic sugar - it properly delegates .send(), .throw(), and .close() to the subgenerator:
def accumulate():
total = 0
while True:
value = yield total
if value is None:
return total # return value becomes StopIteration.value
total += value
def main():
result = yield from accumulate() # delegates send/throw/close
print(f"Final: {result}")
gen = main()
next(gen) # prime
gen.send(10) # 10
gen.send(20) # 30
gen.send(None) # triggers return, prints "Final: 30"
Iterator vs Iterable - The Reusability Pattern¶
# Iterable (reusable) - returns new iterator each time
class Sentence:
def __init__(self, text):
self.words = text.split()
def __iter__(self):
return iter(self.words) # new iterator each call
s = Sentence("one two three")
list(s) # ['one', 'two', 'three']
list(s) # ['one', 'two', 'three'] - works again!
# Iterator (single-use) - IS its own iterator
class WordIterator:
def __init__(self, words):
self.words = words
self.index = 0
def __iter__(self):
return self # returns self = single-use
def __next__(self):
if self.index >= len(self.words):
raise StopIteration
word = self.words[self.index]
self.index += 1
return word
batched() (Python 3.12+)¶
from itertools import batched
list(batched("ABCDEFG", 3)) # [('A','B','C'), ('D','E','F'), ('G',)]
# Replaces common "chunks" recipe
Gotchas¶
- Generators are one-shot - exhausted after single iteration; create new one to iterate again
groupbygroups consecutive equal keys - requires pre-sorted input for complete grouping- Can't get
len(gen)- usesum(1 for _ in gen)but this exhausts it tee(iterator, n)creates n copies but advancing one doesn't advance others - memory grows- Second
list(map_result)returns empty list - iterators are consumed - Generator
.send()requiresnext(gen)first to prime it - calling.send(value)on a fresh generator raises TypeError yield fromdelegates.close()to subgenerator - if subgenerator ignores GeneratorExit, RuntimeError is raised- List comprehension
[x for x in gen]exhausts the generator - can't reuse it after itertools.chainaccepts iterables, not iterators specifically - but if you pass an exhausted iterator, it contributes nothing
See Also¶
- functions - generator functions, yield
- async programming - async generators
- data structures - comprehensions as alternative