Folding in Haskell

(These notes are based in part on chapter 10 of Haskell Programming from First Principles, by Christopher Allen and Julie Mornouki.)

Folding is the general name for a family of related recursive patterns. The essential idea of folding is to take a list and reduce it to, for instance, a single number.

Concrete Examples of Folding

To understand folding in general, lets first look at a few concrete examples of folds.

mysum lst sums the numbers on a list:

mysum :: [Int] -> Int
mysum []     = 0
mysum (x:xs) = x + mysum xs

For example:

mysum [2,1,5,2] 10

myprod calculates the products of numbers on the list, and is very similar:

myprod :: [Int] -> Int
myprod []     = 1              -- important: base case is 1 for multiplication
myprod (x:xs) = x * myprod xs

For example:

> myprod [2,1,5,2]
20

Notice how similar the implementation of myprod is to mysum. The only differences are the name, the value returned for the base case, and the use of * in the recursive case.

Calculating the length a list is also an example of a fold:

len :: [a] -> Int
len []     = 0
len (_:xs) = 1 + len xs   -- _ is an anonymous variable, i.e. one we don't use

For example:

> len [2,1,5,2]
4

The standard Haskell function concat lol concatenates all the lists in lol, e.g.:

> concat ["one","two","three"]
"onetwothree"

We can write our own version like this:

myconcat :: [[a]] -> [a]
myconcat []     = []
myconcat (x:xs) = x ++ (myconcat xs)

For example:

> myconcat ["one","two","three"]
"onetwothree"

Right Folds

All the functions in the previous section have the general structure of a right-associative fold with this structure:

  • A base case that returns the value for the empty list. We call this the initial value, and it is often something like 0, 1, or [].
  • A recursive case that takes the first element of the list and combines it with the rest of the folded list. The combining function always takes two inputs: the first input is the next value from the list that is being processed, and the second input is the accumulation of the previously processed list values.

Here is the general right fold:

myfoldr :: (a -> b -> b) -> b -> [a] -> b
myfoldr _ init_val []     = init_val
myfoldr f init_val (x:xs) = f x (myfoldr f init_val xs)

Take some time understand the type signature — it provides a lot of useful information!

Also note that Haskell provides a standard version of this function called foldr.

myfoldr lets us write single-line versions of all the functions in the previous section:

> myfoldr (+) 0 [2,1,5,2]   -- sum of a list
10

> myfoldr (*) 1 [2,1,5,2]   -- product of a list
20

> myfoldr (\x y -> 1 + y) 0 [2,1,5,2]  -- length of a list
4

> myfoldr (++) [] [[1,2], [3,4], [5]]  -- concatenation of a list
[1,2,3,4,5]

myfoldr is right associative, and so myfoldr (+) 0 [2,1,5,2] is evaluated as if it was bracketed from the right like this:

2 + (1 + (5 + (2 + 0)))

The right-most + is evaluated first, then the second to right-most +, and so on to the left-most + which is evaluated last.

A left associative fold is like a right fold, but the expression is instead bracketed from left to right. Using the myfoldl function defined below, the expression myfoldl (+) 0 [2,1,5,2] is evaluated like this:

(((0 + 2) + 1) + 5) + 2

Here’s a definition of myfoldl:

myfoldl :: (b -> a -> b) -> b -> [a] -> b
myfoldl f acc []     = acc
myfoldl f acc (x:xs) = myfoldl f (f acc x) xs

For some cases, there is no difference between using a left fold or a right fold. For example, 2 + (1 + (5 + (2 + 0))) and (((0 + 2) + 1) + 5) + 2 evaluate to the same thing because + is associative, i.e. \((a + b) + c = a + (b + c)\) is true for any numbers \(a\), \(b\), and \(c\).

* and ++ are also associative, and so in the examples above you can use either myfoldr with myfoldl with them. However, (\x y -> 1 + y), the combiner function used to get the length of a list, is not associative, and so myfoldr and myfoldl give different results:

myfoldr (\x y -> 1 + y) 0 [2,1,5,2]  -- length of the list
4

myfoldl (\x y -> 1 + y) 0 [2,1,5,2]  -- oops: not the length of the list
3

The folds are different because (\x y -> 1 + y) is not an associative function. The easiest way to see this is to let f equal (\x y -> 1 + y), and then do these two calculations:

  (f (f 1 2) 3)  -- left associative = (f 3 3) = 4

  (f 1 (f 2 3))  -- right associative
= (f 1 4)
= 5

Another difference between left and right folds is how they interact with Haskells lazy evaluation. Left folds can work with infinite lists in some cases because they fold starting at the beginning of the list and move to the right. But right folds cannot work with infinite lists because they start at the right end of the list. However, infinite lists don’t have a right end, and foldr loops through the list forever looking for a final element.

There is also a version of foldl called foldl' that is a more efficient version of foldl. In practice, the choice is usually between foldr and foldl'.

Note

There is much more to folding that we have discussed here! We have just been folding lists, but it is possible to fold other structures, such as trees. In mathematics, folds are known as catamorphisms, and there is a body of theoretical work that defines exactly what properties are needed to do folding, and what you can calculate with them.

foldr and the Structure of Lists

A useful way of thinking about folding comes from examining the structure of a list. In Haskell, the list [1,2,3] can be written like this:

(1 : (2 : (3 : [])))

: is the cons operator, i.e. if xs is a list, then x:xs creates a new list that is the same as xs except x has been added to the front.

myfoldr f init lst can be thought of as replacing each : in lst with f, and then replacing the [] in lst with init:

myfoldr f init lst
(1 `f` (2 `f` (3 `f` init)))  -- infix style
(f 1 (f 2 (f 3 init)))        -- prefix style

The notation `f` lets you use a pre-fix function in an infix way, i.e. a `f` b is the same as f a b.

The Type Signature for Folding

The type signatures for foldl and foldr are similar, but not quite the same:

> :type foldr
foldr :: (a -> b -> b) -> b -> [a] -> b

> :type foldl
foldl :: (a -> b -> a) -> a -> [b] -> a

Look carefully at the three inputs foldr takes:

:type foldr
foldr :: (a -> b -> b) -> b -> [a] -> b
  • First input: a function of type a -> b -> b. This is the combiner function that takes an input of type a, an input of type b, and then returns an output of type b.
  • Second input: An item of type b. This is the initial value of the fold, i.e. for summing this would be 0. Note that this is the same as the type of the second parameter of the first function.
  • Third input: A list of items of type a. This is the list being folded. Note that a is the same type as the first input to the function passed to foldr.
  • Output: An item of type b. The list is reduced to a value of type b. Note that this is the same type as the initial value (the second input) passed to foldr.

A similarly careful analysis of the type signature for foldl shows that it is different. For instance, the initial value passed to foldls combiner function is the first input, while for foldr it was the second input.

Exercises

  1. What does foldr (:) [] "apple" evaluate to?

  2. Suppose snoc is defined like this:

    -- snoc 5 [1,2,3] is [1,2,3,5]
    snoc :: a -> [a] -> [a]
    snoc x []     = [x]
    snoc x (y:ys) = y:(snoc x ys)
    

    What does foldr snoc [] "apple" evaluate to?

  3. The standard Prelude function const is defined like this:

    const :: a -> b -> a
    const x y = x
    
    1. What is the type of foldr const? Remember, don’t worry about Foldable: you should assume here that we are only folding lists.
    2. What does foldr const 'a' "mouse" evaluate to?
  4. Suppose f = foldr (-) 0 and g = foldl (-) 0.

    1. Do f and g have the same type?
    2. Find a value x such that f x /= g x.
    3. Find a value y, with length greater than 1, such that f y == g y.
    4. Find all values of a, b, and c such that f [a,b,c] == g [a,b,c].

Answers

  1. "apple"

  2. "elppa"

  3. Answers:

    1. b -> [b] -> b
    2. 'm'
  4. Answers:

    1. Yes:

      > :type foldr (-) 0
      foldr (-) 0 :: (Num b, Foldable t) => t b -> b
      
      > :type foldl (-) 0
      foldl (-) 0 :: (Num b, Foldable t) => t b -> b
      
    2. There are many such values of x that will work, for example:

      > foldr (-) 0 [1,2]
      -1
      > foldl (-) 0 [1,2]
      -3
      
    3. There are many such values of y that will work, for example:

      > foldl (-) 0 [1,2,-1]
      -2
      > foldr (-) 0 [1,2,-1]
      -2
      
    4. To figure this out we will evaluate both expressions on their own. First, f [a,b,c] is a - (b - (c - 0)), which simplifies to a - (b - c) = a - b + c.

      Second, g [a,b,c] is ((0 - a) - b) - c, which simplifies to -a - b - c.

      Since f [a,b,c] == g [a,b,c], a - b + c == -a - b -c. The -b terms cancel out, so this simplifies to a + c == -a + -c, or a + c == -(a + c). The only way this equation can be true is if a + c == 0, i.e. if a == -c.

      So f [a,b,c] == g [a,b,c] just when a == -c, e.g. when the list has the form [-c,b,c]. For example:

      > foldr (-) 0 [0,6,0]
      -6
      > foldl (-) 0 [0,6,0]
      -6
      
      > foldr (-) 0 [14,2,-14]
      -2
      > foldl (-) 0 [14,2,-14]
      -2