Basic Haskell

The following is based in part on https://www.haskell.org/tutorial/, other Haskell web resources, and the 2014 book Thinking Functionally in Haskell by Richard Bird.

Introduction

Haskell is a purely functional programming language, where all expressions yield values, all values have types, and side-effects are carefully controlled. Like Lisp, Haskell takes inspiration from the lambda calculus. Compared to Lisp, there are some major differences:

  • Haskell syntax is generally more human reader-friendly than Lisp — the designers of the language made readability a priority. For instance, arithmetic expressions are written in regular infix notation, e.g. 2 * 3 + 4 instead of (+ (* 2 3) 4).

  • Haskell is statically type-checked, meaning that type errors can be found before the program runs (i.e. at compile-time) by examining the source code. In contrast, Lisp is usually dynamically typed, meaning type errors may not be discovered until run-time.

  • Haskell uses type inference, and so in many cases it is unnecessary to explicitly state the types of functions or variables. Instead, Haskell can often infer the correct types.

  • Haskell uses an evaluation strategy known as lazy evaluation. The basic idea is that an expression is not evaluated until it is needed. This often ends up returning the same result as the regular non-lazy evaluation used by most other languages, but it does have a number of consequences, such as:

    • Potentially improved performance — expressions that are never evaluated run fast.

      On the downside, it can be hard to predict the performance of lazy evaluation ahead of time. This could be a problem in real-time systems where it is required that a calculation finish in a specific amount of time.

    • The possibility of infinite data structures. Interestingly, these have practical uses and can simplify some kinds of code.

    • Some control-flow abstractions can be written using ordinary functions. In Lisp, you can do this with macros, but macros don’t work the same way as ordinary functions.

  • Haskell is purely functional, and does not have loops or assignment statements as found in most other programming languages. Haskell encourages the use of functions, and higher-order functions, like map, filter, and fold.

  • While Haskell is not object-oriented, it has good support for abstraction, e.g. type classes and algebraic data types.

  • Side-effects are strictly controlled in Haskell, and can only appear in restricted parts of your program known as do-environments.

For programmers coming from traditional languages, Haskell can seem overwhelming at first because it has so many novel features. Plus it has mainly been developed by academics instead of practitioners, and so a lot of the discussion around Haskell is based on mathematics and theory.

Pure Functions

An important idea in Haskell is the notion of a pure function. A Haskell function is said to be pure if it satisfies these two properties:

  1. It always returns the same result for the same arguments.

    For example, suppose f 5 returns the value 2 (the notation f 5 is a Haskell function call meaning function f is applied to the argument 5 — no brackets are needed). If f is pure, you can be sure that f 5 will always return 2 no matter when it is called, or how many times you call it.

    What a pure function f returns cannot depend upon anything except the input to f. A pure function cannot depend upon any global variables or other state that might change from call to call.

  2. It has no side-effects.

    Side-effects are things like printing to the screen, reading/writing files, or communicating to other computers on the Internet. Most interactions with the outside world — nearly everything that makes computers useful! — involve side-effects.

Haskell encourages the use of pure functions, since they are usually easier to reason about than non-pure functions. Pure functions are very similar to the functions used in mathematics.

But the big problem with pure function is that purity makes it impossible to write many vital operations. For instance, a read_char() function that returns the next character from a file on disk can’t be pure because it doesn’t return the same character every time it’s called. Other impure functions include random number generators (e.g. rand() in C++ is impure because it returns different numbers each time it’s called), or input functions that get information from the user (e.g. Python’s input(prompt) function is impure because it returns whatever string the user types).

So pure functions can’t be the only kind of function in a useful programming language. Lisp, like almost all programming languages, deals with this problem by simply allowing pure and non-pure functions to intermingle, leaving it up to the programmer to deal with the consequences.

Haskell, however, takes a very different approach to purity. Impure functions are only permitted inside do-environments. The use and design of these environments is one of the most significant features of Haskell, and we’ll have more to say about them later.

For now, though, we will learn about Haskell by exploring pure functions.

Basic Features of GHCI

Like Lisp, we often interact with Haskell through its interpreter. We’ll uses “ghci” in these notes, e.g.:

GHCi, version 8.0.2: http://www.haskell.org/ghc/  :? for help
Prelude> 1+2*3
7
Prelude> (1+2)*3
9

Right away you can see that Haskell uses regular infix notation for arithmetic expressions.

Here’s an example of a Haskell function declaration:

> inc :: Integer -> Integer   -- type signature
> inc n = n + 1

You should type the above lines into a text file named, say, test.hs, and then load it into the interpreter like this:

Prelude> :load "test.hs"
[1 of 1] Compiling Main             ( test.hs, interpreted )
Ok, modules loaded: Main.

The name of this function is inc. The first line is it’s type signature, which specifies that it takes one input, n, of type Integer, and returns one value, also of type Integer. The token :: is read “has type”.

Source code comments in Haskell start with --, and continue to the end of the line.

To call inc, write its function name followed by its input. For example:

> inc 5
6

> inc 8
9

We’ll often want to know an expression’s type, and for that we’ll use the :type operator (:t for short). For example:

> :type inc
inc :: Integer -> Integer

> :type inc 4
inc 4 :: Integer

Some simple expressions can have types that are more complex than you might expect. For example:

> :type 5
5 :: Num a => a

> :type 5.3
5.3 :: Fractional a => a

For the moment we will ignore these complexities. But the basic idea is that, for the number 5, a is any type of value that implements all the requirements of Num.

Here are a couple more examples of functions:

-- calculates the length of the hypotenuse of a right-triangle
-- with leg lengths a and b
hyp a b = sqrt (a*a + b*b)

-- calculates the distance between points (x, y) and (a, b)
dist x y a b = hyp (x - a) (y - b)

Notice that these two functions don’t include a type signature. That’s okay in this case: Haskell is able to automatically infer them:

> :type hyp
hyp :: Floating a => a -> a -> a

> :type dist
dist :: Floating a => a -> a -> a -> a -> a

In practice, automatic type inference can help make your programs shorter and easier to read, and yet still benefit from static type checking.

That said, throughout these notes we will almost always write type signatures for functions. Not only is that a good way to learn about the syntax and semantics of type signatures, but they are often useful documentation for functions that help when designing and implementing programs.

Negative Numbers

Haskell has a small problem with negative numbers:

> inc 1
2

> inc -1

<interactive>:12:5:
    No instance for (Num (Integer -> Integer))
      arising from a use of `-'
    Possible fix:
      add an instance declaration for (Num (Integer -> Integer))
    In the expression: inc - 1
    In an equation for `it': it = inc - 1

To deal with this, you often need to write negative numbers inside brackets:

> inc (-1)
0

Sometimes you need the brackets, sometimes you don’t:

> -2      -- ok
-2

> -2 + 1  -- ok
-1

> 1 + -2  -- oops: -2 doesn't work here!

<interactive>:16:1:
    Precedence parsing error
        cannot mix `+' [infixl 6] and prefix `-' [infixl 6] in the same infix expression

> 1 + (-2)   -- need () around -2
-1

The problem is due to a parsing ambiguity with -. In Haskell, the expression x - 1 could mean “subtract 1 from variable x”, or it could mean “apply function x to -1”. The brackets are needed to avoid this ambiguity in some cases.

Languages like C don’t have this problem because the expression x - 1 only has one meaning: subtract 1 from x. In C, function calls always use (), and so if the expression were a function call it would have been written x(-1).

Calling Functions

Consider these two expressions:

> inc 3       -- ok
4
> inc inc 3   -- oops: error!

<interactive>:21:1:
    Couldn't match expected type `a0 -> t0' with actual type `Integer'
    The function `inc' is applied to two arguments,
    but its type `Integer -> Integer' has only one
    In the expression: inc inc 3
    In an equation for `it': it = inc inc 3

<interactive>:21:5:
    Couldn't match expected type `Integer'
                with actual type `Integer -> Integer'
    In the first argument of `inc', namely `inc'
    In the expression: inc inc 3
    In an equation for `it': it = inc inc 3

The expression inc inc 3 is an error because Haskell first evaluates inc inc. This is a type error: inc requires a parameter of type Integer, but we’ve passed it inc which has type Integer -> Integer.

One solution is to use brackets:

> inc (inc 3)
5

Another solution is to use the $ operator:

> inc $ inc 3
3

$ is the low-priority application operator, and is often used to reduce the number of brackets in expressions. Once you get used to it, it can be quite handy. However, many newcomers to Haskell find it confusing, so we will use it infrequently.

Lists and Polymorphic Types

The Haskell type Integer is an example of a concrete type that mirrors the mathematical notion of an integer. However, Haskell functions are often written using polymorphic types, where the exact types are not specified.

We’ll introduce Haskell’s polymorphic types and lists types at the same time, as they go naturally go together. In Haskell, lists are written using square-brackets, and elements are separated by commas. For example, the list ['a', 'b', 'c'] has the type [Char], i.e. it’s a list of Char values.

The elements on a list must all be of the same type. For example, you could have a list of functions:

> :type [inc, inc, inc]
[inc, inc, inc] :: [Integer -> Integer]

The following list is not allowed because it has two different types of values:

> [1, 2, 'a']

<interactive>:54:2:
    No instance for (Num Char) arising from the literal `1'
    Possible fix: add an instance declaration for (Num Char)
    In the expression: 1
    In the expression: [1, 2, 'a']
    In an equation for `it': it = [1, 2, 'a']

Thus, in Haskell, if L is a list, then you know for certain that all the elements of L have the same type.

Now lets look at a list function. len uses recursion to calculate the length of a list containing elements of any type a:

> len :: [a] -> Integer      -- type signature: [a] is a list of any type
> len []     =  0            -- base case
> len (x:xs) =  1 + len xs   -- recursive case

len consists of two equations. The first equation defines the base case: the empty list [] has length 0. The second equation applies the recursive case to a non-empty list.

When len [1, 2, 3] is evaluated, Haskell checks the equations of len in the order they are written. So first it tries to match len [] = 0. This fails, because [] doesn’t match [1, 2, 3]. Then it moves on to the second equation, which it does match.

In its second equation, len uses pattern matching to extract the head and rest of the list. When you evaluate len [1, 2, 3], [1, 2, 3] gets matched by x:xs, and x is bound to 1 and xs is bound to [2, 3]. The expression x:xs is a nice syntax for getting the first element, and the rest of the elements, of a list.

The variable names in x:xs are just a convention. The x refers to the first element of the list, while xs is the plural of x because it refers to the rest of the list elements. You could, if you prefer, write first:rest, a:b, or even car:cdr.

Importantly, len does not depend on any specific details of the type a, and so it works with any list whose elements are all of type a, i.e. it’s input is a value of type [a]. For example:

> len [1, 2, 3]          -- [1, 2, 3] :: [Integer]
3
> len ['a', 'b', 'c']    -- ['a', 'b', 'c'] :: [Char]
3
> len [[1], [2], [3]]    -- [[1], [2], [3]] :: [[Integer]]
3
> len [len, len, len]    -- [len, len, len] :: [[a] -> Integer]
3

Haskell also provides standard functions head and tail that work like this:

> head [1, 2, 3]
1

> tail [1, 2, 3]
[2,3]

Note that it’s an error to call head or tail on an empty list:

> head []
*** Exception: Prelude.head: empty list

> tail []
*** Exception: Prelude.tail: empty list

What are the type signatures for head and tail? For head, the input is any list, and the result is the first element of the list. A general list has the type [a], where a is any type. So, the type signature of head must be [a] -> a, i.e. it takes a list of values of type a, and returns a value of type a.

Similarly, tail takes a list of values of type a as input, and returns a list of values of type a as output. So its type signature is [a] -> [a].

You can verify these are the correct signatures using :type:

> :type head
head :: [a] -> a

> :type tail
tail :: [a] -> [a]

You might think that the function tail has the type a -> a, i.e. it takes a value of type a as input and gives an output of type a. But that’s too general. It’s an error, for example, to evaluate tail 5 because tail requires a list type as input.

Similarly, you might think that tail has the type [Integer] -> [Integer]. But that type is too specific: it says that tail would only work with Integer lists.

Haskell uses type inference at compile time to do this type checking, and is careful to get just the right type, i.e. not too general and too specific. This turns out to be a non-trivial task, and Haskell uses the Hindley-Milner type system to figure out the correct types for expressions.

A Few List Functions

Haskell provides a number of helper functions for lists. For example, you can create a range of numbers like this:

> [1..5]
[1,2,3,4,5]

> [4..9]
[4,5,6,7,8,9]

You can also have an increment, e.g.:

> [2,4..10] [2,4,6,8,10]

> [0,5..20] [0,5,10,15,20]

> [2,1..(-5)] [2,1,0,-1,-2,-3,-4,-5]

The notation is a bit unusual: the first two numbers are the first two numbers of the list, and the number after .. is the last number in the list. The numbers in between increase (or decrease) by the difference between the first two numbers.

++ concatenates lists:

> [1,2,3] ++ [4..8]
[1,2,3,4,5,6,7,8]

> [1,2,3] ++ [-1,-2,-3]
[1,2,3,-1,-2,-3]

> [1] ++ [2,3] ++ [4,5,6]
[1,2,3,4,5,6]

> [1..3] ++ [2..6]
[1,2,3,2,3,4,5,6]

++ is an example of an infix operator, and when you use an infix operator on its own (e.g. with :type) you must put it in round brackets:

> :type ++    -- oops!
<interactive>:1:1: parse error on input `++'

> :type (++)  -- brackets required
(++) :: [a] -> [a] -> [a]

Strings

Haskell strings are lists of characters, i.e. a string has the type [Char]:

> :type "apple"
"apple" :: [Char]

Haskell pre-defines the type String like this:

type String = [Char]   --- String is a synonym for [Char]

Since strings are lists, general list functions work with them. For example:

> head "apple"
'a'

> tail "apple"
"pple"

> len "apple"
5

> "apple" ++ "s"
"apples"

> ['r', 'a', 'c', 'e'] ++ "car"
"racecar"

> reverse "orchestra"
"artsehcro"

Curried Functions

Consider the following function:

> add :: Integer -> Integer -> Integer  -- type signature
> add x y =  x + y

add sums two integers, x and y. It’s type signature is a bit strange: it says that add takes two Integer values as input, and returns an Integer as output.

The type Integer -> Integer -> Integer is the same as Integer -> (Integer -> Integer). The bracketed version highlights an important Haskell feature: if you call add with one Integer (instead of two) then it returns a value of type Integer -> Integer, i.e. a function that takes one Integer and returns an Integer.

For example, add 3 evaluates to a function. You could use it like this:

> inc3 = add 3

Now inc3 is a function that takes one input and returns that input incremented by 3:

> inc3 4
7

inc3 takes an Integer and returns an Integer:

> :type inc3
inc3 :: Integer -> Integer

Notice that we did not need to write a type signature for inc3Haskell was able to infer it automatically.

Curried functions can be useful in practice. For example, Haskell has a map function: map f lst applies function f to every element of lst. If we wanted to add 1 to every element of a list of integers, we could do it like this:

> map (add 1) [1,2,3]
[2,3,4]

This also works with the built-in +:

> map (+ 1) [1,2,3]
[2,3,4]

Here’s another example. Recall that ++ concatenates two lists:

> [1, 2] ++ [6, 8, 6]
[1,2,6,8,6]

Here’s its type:

> :type (++)
(++) :: [a] -> [a] -> [a]

The type [a] -> [a] -> [a] is equivalent to [a] -> ([a] -> [a]), and so if we were to pass ++ a single input, it would return a value of type [a] -> [a], i.e. a function that takes a list of values of type a as input, and returns a list of values of type a as ouput.

++ is an infix operator, which means that it is normally put between its inputs. Alternatively, you can use ++ (and many other infix operators) in prefix style by putting brackets around it:

> (++) "ab" "cd"
"abcd"

So we could write this function, which adds “- ” to the start of a string:

> dash = (++) "- "

For instance:

> dash "apples"
"- apples"

> map dash ["apples", "oranges", "grapes"]
["- apples","- oranges","- grapes"]

Or you could use currying to defined this function:

> dashed = map dash

Then:

> dashed ["apples", "oranges", "grapes"]
["- apples","- oranges","- grapes"]

The definition of dashed is interesting: it is only 4 tokens long. It doesn’t use any if-statements, loops, recursion, variables, or lambdas. It is short, easy to understand (at least once you get used to Haskell!), consists of previously made functions, and is statically type-checked.

Example: Curried Decrement

A problem with currying is that you can only curry arguments in the order they occur. So, for example, we can implement inc like this:

> inc = (+) 1

> inc 3
4

But decrement is not so straightforward:

> dec = (-) 1

> dec 3
-2

The problem is that dec 3 is equivalent to (-) 1 3, which is indeed -2. But the order of the inputs to (-) are in the wrong order, and so what we want is something liked this:

minus_flipped x y = y - x

Then we can do this:

> minus_flipped x y = y - x

> dec = minus_flipped 1

> dec 3
2

This is a common enough problem with currying that Haskell provides a helper function called flip to make this easier to implement. flip takes a 2-input function f as input, an returns a new 2-input function that is the same as f except the order of the inputs is switched.

For instance:

> (-) 5 2         -- 5 - 2
3

> (flip (-)) 5 2  -- 2 - 5
-3

> flip (-) 5 2    -- 2 - 5
-3

> (/) 10 2        -- 10 / 2
5.0

> flip (/) 10 2   -- 2 / 10
0.2

Now we can write dec in one line like this:

> dec = flip (-) 1

> dec 5
4

flip is pre-defined in Haskell, but it turns out to have a short implementation:

flip :: (a -> b -> c) -> b -> a -> c
flip f x y = f y x

The type signature tells us that flip has 3 inputs:

  • the first input is a function that takes a value of type a and then a value of type b, and returns a value of type c
  • the second input is a value of type b
  • the third input is a value of type a
  • the return value of the function is a value of type c

If you pass flip just a function, e.g. flip (-), then it returns a function of type b -> a -> c. Note that the types of the parameters have been switched: the b value comes first and the a value comes second in the flipped function.