The Iterator Protocol#

Twitter Handle LinkedIn Profile GitHub Profile Tag

What Is An Iterable?#

From python’s documentation, the most common way to implement an iterable is to have the dunder method __iter__ or with __getitem__ but with sequence semantics.

A simple litmus test for data structures is to call the iter function on it and see if a TypeError is raised.

try:
    iter(1)
except TypeError:
    print("1 is not iterable")


_ = iter("ChatGPT") # string is an iterable
_ = iter([1, 2, 3]) # list is an iterable
1 is not iterable
print(hasattr(1, "__iter__"))
print(hasattr("ChatGPT", "__iter__"))
print(hasattr([1, 2, 3], "__iter__"))
False
True
True
from typing import List, Iterable, Iterator

class NumberSequence:
    def __init__(self, numbers: List[int]):
        self.numbers = numbers

    def __iter__(self) -> Iterator[int]:
        return iter(self.numbers)


seq = NumberSequence([1, 2, 3, 4, 5])
print(isinstance(seq, Iterable))
print(isinstance(seq, Iterator))
for num in seq:
    print(num)
True
False
1
2
3
4
5

What Is An Iterator?#

From python’s documentation, an iterator is an object that implements the __iter__ and __next__ dunder methods.

By default, if you pass an data structure that is an instance of an iterable to the iter function, it will return an iterator. You can easily see from the python’s source code below (signature of the iter function):

@overload
def iter(object: SupportsIter[_SupportsNextT], /) -> _SupportsNextT: ...
@overload
def iter(object: _GetItemIterable[_T], /) -> Iterator[_T]: ...
@overload
def iter(object: Callable[[], _T | None], sentinel: None, /) -> Iterator[_T]: ...
@overload
def iter(object: Callable[[], _T], sentinel: object, /) -> Iterator[_T]: ...
a = [1, 2, 3]
print(isinstance(a, Iterable))
print(isinstance(a, Iterator))
print(type(a))

b = iter(a)
print(isinstance(b, Iterable))
print(isinstance(b, Iterator))
print(type(b))
True
False
<class 'list'>
True
True
<class 'list_iterator'>

Now what’s the difference since we apparently see a is a list and is an iterable, we can also easily loop over it (cause it implements __iter__). But we see that a is not an iterator cause it does not implement __next__. So that’s one key difference. So we create an iterator out of a by calling iter(a) (which is denoted as b here). We shall see some key differences.

We first see that b is not subscriptable, while a is.

# accessing elements

print(a[0])

try:
    print(b[0])
except TypeError:
    print("b is not subscriptable")
1
b is not subscriptable

We then see that iterating over a multiple times yields the same result, say we do it twice below:

for i in a:
    print(i)

print("-" * 10)

for i in a:
    print(i)
1
2
3
----------
1
2
3

But iterating over b twice yields nothing the second time!

for i in b:
    print(i)

print("-" * 10)

for i in b:
    print(i)
1
2
3
----------

This is because an iterator is essentially a stream of data, and once it’s exhausted, it’s empty. Under the hood, calling for i in b is equivalent to the below.

b = iter(a) # b is an iterator now

# step 1. the `for` loop calls `iter()` on the iterable, which returns an iterator
b = iter(b)
# step 2. the `for` loop calls `next()` on the iterator, which returns the next item in the stream
try:
    while True:
        print(next(b))
except StopIteration:
    # step 3. When the iterator is exhausted, a `StopIteration` exception is raised,
    # which the `for` loop will catch under the hood and terminate the loop.
    print("iterator is exhausted")
1
2
3
iterator is exhausted

To put it in a more modular way, we can do so below and this simple re-enact of for loop will not run into the same issue above because in the function we will always “refresh” the iterator by calling iter() on the iterable - so a “fresh” iterator is always passed around.

from typing import List, Iterator, Iterable, TypeVar


T = TypeVar("T")

def print_iterable_using_for(iterable: Iterable[T]) -> None:
    for item in iterable:
        print(item)

def print_iterable_using_next(iterable: Iterable[T]) -> None:
    iterator = iter(iterable)
    while True:
        try:
            item = next(iterator)
        except StopIteration:  # noqa: PERF203
            break
        else:
            print(item)

print_iterable_using_next([1, 2, 3])
1
2
3
class VerboseIterator:
    def __init__(self, data: List[T]) -> None:
        self.data = iter(data)
        print("Iterator created")

    def __iter__(self) -> Iterator[T]:
        return self

    def __next__(self) -> T:
        try:
            return next(self.data)
        except StopIteration:
            print("Iterator exhausted")
            raise

verbose_iter = VerboseIterator([1, 2, 3])
for i in verbose_iter:
    print(i)

print("Trying to iterate again:")
for i in verbose_iter:
    print(i)
Iterator created
1
2
3
Iterator exhausted
Trying to iterate again:
Iterator exhausted

So the sequence for the for loop is as follows:

iterable -> iter() -> iterator -> next() -> items

All Iterators Are Iterable, But Not All Iterables Are Iterators#

This is simple logic, because an iterator is by definition an iterable since it always implements __iter__, but not the other way around.

issubclass(Iterator, Iterable)
True
issubclass(Iterable, Iterator)
False

Iterators Are Lazy But Not All Iterables Are Lazy#

Again, a list is an iterable but not an iterator cause it does not implement __next__ and it is not lazy in the sense that it computes the items in the list all at once. But for a lazy iterable like an iterator, it only computes the items on-the-fly as required as they do not store the items in memory until required.

All Generators Are Iterators#

From python’s documentation, I quote:

Python’s generators provide a convenient way to implement the iterator protocol. If a container object’s __iter__() method is implemented as a generator, it will automatically return an iterator object (technically, a generator object) supplying the __iter__() and __next__() methods. More information about generators can be found in the documentation for the yield expression.

Python Documentation

import types
from typing import Generator
issubclass(types.GeneratorType, Iterator), issubclass(Generator, Iterator)
(True, True)
def squared(start: int, end: int) -> Generator[int, None, None]:
    for i in range(start, end):
        yield i ** 2

generator = squared(1, 4)

# Check if the generator is an iterator
print(isinstance(generator, Iterator))  # True
print(hasattr(generator, '__iter__'))   # True
print(hasattr(generator, '__next__'))   # True

# Using the generator as an iterator
print(next(generator))  # 1
print(next(generator))  # 4
print(next(generator))  # 9
try:
    print(next(generator))
except StopIteration:
    print("generator is exhausted")
True
True
True
1
4
9
generator is exhausted

We can also map the above to a generator expression.

generator_expression = (i ** 2 for i in range(1, 4)) # same as generator

So the generator is an iterator, this means we can do a 1-1 conversion/mapping between them, means we can create an iterator class that behaves like a generator.

from __future__ import annotations

class Squared:
    def __init__(self, start: int, end: int) -> None:
        self.start = start
        self.end = end

    def __iter__(self) -> Squared:
        return self

    def __next__(self) -> int:
        if self.start >= self.end:
            raise StopIteration
        result = self.start ** 2
        self.start += 1
        return result

squared = Squared(1, 4)
for i in squared:
    print(i)
1
4
9