Generators and Coroutines

In purely functional programming, when we call a function it returns a value:

user=> (defn incr [n] (+ n 1))
#'user/incr
user=> (incr 5)
6
user=> (incr 2)
3

You can call a function as many times as you like, and each time the same thing happens: the function is called, it computes a value, it returns this value, and then it ends. In purely functional programming, functions have no memory of any previous function calls.

We can generalize ordinary function calling as follows. Instead of a function returning a value, we could instead allow it to report a value without returning. A function could report the result of some calculation, and then just keep going doing more calculations.

This turns out to be a very practical and useful idea. A function that behaves like this is no longer really a function, but instead is generally called a coroutine.

Generators in Python

Python is a language that supports coroutines, and here we’ll look at a kind of coroutine called a generator. Generators are a convenient way to create iterators. An iterator is an object with, at least, a function called next() that returns a value every time you call it. A well-designed iterator only does work when next() is called, so that, for instance, if you want only, say, the first 3 values from an iterator, then you just call next() three times. Later values don’t need to be calculated.

For example, this generator creates an iterator that returns three different greetings:

def gen_hello():
    yield 'Hi!'
    yield 'Hello!'
    yield "How's it going?"

You could use it like this:

>>> greet = gen_hello()
>>> greet.next()
'Hi!'
>>> greet.next()
'Hello!'
>>> greet.next()
"How's it going?"
>>> greet.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

Or with a for-loop like this:

>>> greet = gen_hello()
>>> for g in greet: print g
...
Hi!
Hello!
How's it going?

Or:

>>> for g in gen_hello(): print g
...
Hi!
Hello!
How's it going?

Infinite Loops in Generators

Infinite loops are often useful in generators, e.g:

# generates all even ints, starting at 0
def gen_evens():
    e = 0
    while True:  # infinite loop!
        yield e
        e += 2

>>> evens = gen_evens()
>>> evens.next()
0
>>> evens.next()
2
>>> evens.next()
4
>>> evens.next()
6

The iterator created by the generator only runs when we call next(), so there is no infinite loop here — unless we call next() an infinite number of times, e.g.:

for e in evens:   # infinite loop!
    print e

A nice feature of this style of programming is that it separates a loop from its stopping condition. gen_evens knows how to generate even numbers, but it doesn’t know when to stop. The code that uses gen_evens is responsible for deciding when to stop calling it.

If you want, say, the first 10 even numbers, then you should call evens.next() 10 times. We could do this in general like this:

# generates, at most, the first n values that iter yields
def gen_first(iter, n):
    i = 0
    while i < n:
        yield iter.next()
        i += 1

Example: A Lexical Scanner

Here’s a generator that creates a tokenizer (i.e. a lexical scanner) for arithmetic expressions:

def gen_token(s):
    i = 0
    while i < len(s):
        if s[i] in ' \n\r\t':
            pass   # do nothing: ignore whitespace
        elif s[i] in '+-*/()':
            yield s[i]
        elif s[i] in '0123456789':
            num = s[i]
            i += 1
            while i < len(s) and s[i] in '0123456789':
                num += s[i]
                i += 1
            i -= 1
            yield num

        i += 1

def test():
    input = ' ( 1 - 34389) + 34'
    print 'input = "%s"' % input
    gen = gen_token(input)
    for token in gen:
        print '"%s"' % token

>>> test()
input = " ( 1 - 34389) + 34"
"("
"1"
"-"
"34389"
")"
"+"
"34"

Every time next() is called on the iterator, it runs until it hits a yield, and then returns the yield value. Then the iterator stops and waits: it does nothing until next() is called again. So if you never call next() again, it does no more work.

Example: Python’s itertools Library

itertools is a standard collection of useful iterator-related functions. For instance, itertools can generate all permutations of a sequence like this:

>>> from itertools import *

>>> for x in permutations('abc'): print x
...
('a', 'b', 'c')
('a', 'c', 'b')
('b', 'a', 'c')
('b', 'c', 'a')
('c', 'a', 'b')
('c', 'b', 'a')

Or, if you would like the Cartesian product a set with itself:

>>> for x in product('abc', repeat=2): print x
...
('a', 'a')
('a', 'b')
('a', 'c')
('b', 'a')
('b', 'b')
('b', 'c')
('c', 'a')
('c', 'b')
('c', 'c')

If you want all bit strings of length 3, you could generate them like this:

>>> for x in product('01', repeat=3): print x
...
('0', '0', '0')
('0', '0', '1')
('0', '1', '0')
('0', '1', '1')
('1', '0', '0')
('1', '0', '1')
('1', '1', '0')
('1', '1', '1')

Change the 3 to a 4, and you’ll get all bit strings of length 4:

>>> for x in product('01', repeat=4): print x
...
('0', '0', '0', '0')
('0', '0', '0', '1')
('0', '0', '1', '0')
('0', '0', '1', '1')
('0', '1', '0', '0')
('0', '1', '0', '1')
('0', '1', '1', '0')
('0', '1', '1', '1')
('1', '0', '0', '0')
('1', '0', '0', '1')
('1', '0', '1', '0')
('1', '0', '1', '1')
('1', '1', '0', '0')
('1', '1', '0', '1')
('1', '1', '1', '0')
('1', '1', '1', '1')

You can even pass in other generators to these functions. For example:

>>> for x in product(gen_hello(), repeat=2): print x
...
('Hi!', 'Hi!')
('Hi!', 'Hello!')
('Hi!', "How's it going?")
('Hello!', 'Hi!')
('Hello!', 'Hello!')
('Hello!', "How's it going?")
("How's it going?", 'Hi!')
("How's it going?", 'Hello!')
("How's it going?", "How's it going?")

Or even:

>>> for x in permutations(gen_hello()): print x
...
('Hi!', 'Hello!', "How's it going?")
('Hi!', "How's it going?", 'Hello!')
('Hello!', 'Hi!', "How's it going?")
('Hello!', "How's it going?", 'Hi!')
("How's it going?", 'Hi!', 'Hello!')
("How's it going?", 'Hello!', 'Hi!')