The Iterator Protocol#
What Is An Iterable?#
From python’s documentation,
the most common way to implement an iterable is to have the dunder method __iter__
or with __getitem__
but with sequence semantics.
A simple litmus test for data structures is to call the iter
function on it and
see if a TypeError
is raised.
try:
iter(1)
except TypeError:
print("1 is not iterable")
_ = iter("ChatGPT") # string is an iterable
_ = iter([1, 2, 3]) # list is an iterable
1 is not iterable
print(hasattr(1, "__iter__"))
print(hasattr("ChatGPT", "__iter__"))
print(hasattr([1, 2, 3], "__iter__"))
False
True
True
from typing import List, Iterable, Iterator
class NumberSequence:
def __init__(self, numbers: List[int]):
self.numbers = numbers
def __iter__(self) -> Iterator[int]:
return iter(self.numbers)
seq = NumberSequence([1, 2, 3, 4, 5])
print(isinstance(seq, Iterable))
print(isinstance(seq, Iterator))
for num in seq:
print(num)
True
False
1
2
3
4
5
What Is An Iterator?#
From python’s documentation,
an iterator is an object that implements the __iter__
and __next__
dunder methods.
By default, if you pass an data structure that is an instance of an iterable to the iter
function,
it will return an iterator. You can easily see from the python’s source code below (signature of the iter
function):
@overload
def iter(object: SupportsIter[_SupportsNextT], /) -> _SupportsNextT: ...
@overload
def iter(object: _GetItemIterable[_T], /) -> Iterator[_T]: ...
@overload
def iter(object: Callable[[], _T | None], sentinel: None, /) -> Iterator[_T]: ...
@overload
def iter(object: Callable[[], _T], sentinel: object, /) -> Iterator[_T]: ...
a = [1, 2, 3]
print(isinstance(a, Iterable))
print(isinstance(a, Iterator))
print(type(a))
b = iter(a)
print(isinstance(b, Iterable))
print(isinstance(b, Iterator))
print(type(b))
True
False
<class 'list'>
True
True
<class 'list_iterator'>
Now what’s the difference since we apparently see a
is a list and is an iterable,
we can also easily loop over it (cause it implements __iter__
). But we see
that a
is not an iterator cause it does not implement __next__
. So that’s
one key difference. So we create an iterator out of a
by calling iter(a)
(which is denoted as b
here). We shall see some key differences.
We first see that b
is not subscriptable, while a
is.
# accessing elements
print(a[0])
try:
print(b[0])
except TypeError:
print("b is not subscriptable")
1
b is not subscriptable
We then see that iterating over a
multiple times yields the same result, say
we do it twice below:
for i in a:
print(i)
print("-" * 10)
for i in a:
print(i)
1
2
3
----------
1
2
3
But iterating over b
twice yields nothing the second time!
for i in b:
print(i)
print("-" * 10)
for i in b:
print(i)
1
2
3
----------
This is because an iterator is essentially a stream of data, and once it’s exhausted,
it’s empty. Under the hood, calling for i in b
is equivalent to the below.
b = iter(a) # b is an iterator now
# step 1. the `for` loop calls `iter()` on the iterable, which returns an iterator
b = iter(b)
# step 2. the `for` loop calls `next()` on the iterator, which returns the next item in the stream
try:
while True:
print(next(b))
except StopIteration:
# step 3. When the iterator is exhausted, a `StopIteration` exception is raised,
# which the `for` loop will catch under the hood and terminate the loop.
print("iterator is exhausted")
1
2
3
iterator is exhausted
To put it in a more modular way, we can do so below and this simple re-enact of
for loop will not run into the same issue above because in the function we will
always “refresh” the iterator by calling iter()
on the iterable - so a “fresh”
iterator is always passed around.
from typing import List, Iterator, Iterable, TypeVar
T = TypeVar("T")
def print_iterable_using_for(iterable: Iterable[T]) -> None:
for item in iterable:
print(item)
def print_iterable_using_next(iterable: Iterable[T]) -> None:
iterator = iter(iterable)
while True:
try:
item = next(iterator)
except StopIteration: # noqa: PERF203
break
else:
print(item)
print_iterable_using_next([1, 2, 3])
1
2
3
class VerboseIterator:
def __init__(self, data: List[T]) -> None:
self.data = iter(data)
print("Iterator created")
def __iter__(self) -> Iterator[T]:
return self
def __next__(self) -> T:
try:
return next(self.data)
except StopIteration:
print("Iterator exhausted")
raise
verbose_iter = VerboseIterator([1, 2, 3])
for i in verbose_iter:
print(i)
print("Trying to iterate again:")
for i in verbose_iter:
print(i)
Iterator created
1
2
3
Iterator exhausted
Trying to iterate again:
Iterator exhausted
So the sequence for the for
loop is as follows:
iterable -> iter() -> iterator -> next() -> items
All Iterators Are Iterable, But Not All Iterables Are Iterators#
This is simple logic, because an iterator is by definition an iterable since
it always implements __iter__
, but not the other way around.
issubclass(Iterator, Iterable)
True
issubclass(Iterable, Iterator)
False
Iterators Are Lazy But Not All Iterables Are Lazy#
Again, a list is an iterable but not an iterator cause it does not implement
__next__
and it is not lazy in the sense that it computes the items in the
list all at once. But for a lazy iterable like an iterator, it only computes
the items on-the-fly as required as they do not store the items in memory until
required.
All Generators Are Iterators#
From python’s documentation, I quote:
Python’s generators provide a convenient way to implement the iterator protocol. If a container object’s
__iter__()
method is implemented as a generator, it will automatically return an iterator object (technically, a generator object) supplying the__iter__()
and__next__()
methods. More information about generators can be found in the documentation for the yield expression.
import types
from typing import Generator
issubclass(types.GeneratorType, Iterator), issubclass(Generator, Iterator)
(True, True)
def squared(start: int, end: int) -> Generator[int, None, None]:
for i in range(start, end):
yield i ** 2
generator = squared(1, 4)
# Check if the generator is an iterator
print(isinstance(generator, Iterator)) # True
print(hasattr(generator, '__iter__')) # True
print(hasattr(generator, '__next__')) # True
# Using the generator as an iterator
print(next(generator)) # 1
print(next(generator)) # 4
print(next(generator)) # 9
try:
print(next(generator))
except StopIteration:
print("generator is exhausted")
True
True
True
1
4
9
generator is exhausted
We can also map the above to a generator expression.
generator_expression = (i ** 2 for i in range(1, 4)) # same as generator
So the generator is an iterator, this means we can do a 1-1 conversion/mapping between them, means we can create an iterator class that behaves like a generator.
from __future__ import annotations
class Squared:
def __init__(self, start: int, end: int) -> None:
self.start = start
self.end = end
def __iter__(self) -> Squared:
return self
def __next__(self) -> int:
if self.start >= self.end:
raise StopIteration
result = self.start ** 2
self.start += 1
return result
squared = Squared(1, 4)
for i in squared:
print(i)
1
4
9