More About Clojure¶

Clojure is a powerful language packed with many interesting ideas. Even if you never end up using Clojure in practice, it’s instructive to examine some of its many interesting feature.

apply and eval¶

The apply and eval functions are used to evaluate lists. Here is the usual way of evaluating expressions:

=> (* 1 2 3)
6
=> (+ 1 2 3)
6
=> (- 1 2 3)
-4

An equivalent way is to use apply like this:

=> (apply * '(1 2 3))
6
=> (apply + '(1 2 3))
6
=> (apply - '(1 2 3))
-4

As you can see, (apply f seq) takes a function f and evaluates it using the items in seq as its input. It’s as if f is “cons-ed” onto seq, and then that expression is evaluated.

The eval function takes an entire list an evaluates it:

=> (eval '(* 1 2 3))
6
=> (eval '(+ 1 2 3))
6
=> (eval '(- 1 2 3))
-4

eval is a very powerful function: it is essentially Clojure implemented in Clojure! For example, you can easily implement your own version of apply using eval:

(defn my-apply [f seq]
        (eval (cons f seq))
)

Rest Args¶

Sometimes it’s convenient to pass an unspecified number of inputs to a function. For example, it’s nice to have a sum function that works like this:

=> (sum 1 2 3)
6
=> (sum 6 7 8 9)
30
=> (sum 6)
6
=> (sum)
0

This cuts down on the number parentheses and ' marks, which makes the code easier to read.

In Clojure, you can implement sum using rest arguments:

(defn sum [& nums]
        (apply + nums)
)

Here, nums is the sequence of all the arguments passed to sum — the & tells Clojure that 0 or more inputs are expected.

Argument Destructuring¶

Consider the following function that adds two 2-dimensional points:

    (defn add-points1 [p1 p2]
            (let [x1 (first p1)
                      y1 (second p1)
                      x2 (first p2)
                      y2 (second p2)
                    ]
                    (list (+ x1 x2) (+ y1 y2))
            )
    )

    => (add-points1 '(1 2) '(3 4))
(4 6)

Most of the function is devoted to extracting that data out of p1 and p2. The code isn’t difficult, but it is tedious to write, and thus error- prone (are you sure you got all the 1s and 2s right?).

So Clojure provides a nice feature called destructuring that can help you skip a lot of this sort of data extraction:

(defn add-points2 [[x1 y1] [x2 y2]]    ;; with destructuring
        (list (+ x1 x2) (+ y1 y2))
)

This is shorter and clearer. Clojure automatically assigns the correspond values to x1, y1, x2, and y2.

However, it’s still up to you, the programmer, to pass in correctly formatted data:

=> (add-points2 '(1 2 3) '(3 4 5))
(4 6)
=> (add-points2 '(1) '(3 4 5))
java.lang.NullPointerException (NO_SOURCE_FILE:0)

Multiple Function Signatures¶

Sometimes you want a function to take different numbers, or kinds, of inputs. For example:

=> (make-point)
(0 0)
=> (make-point 8)
(8 8)
=> (make-point 4 5)
(4 5)

You can implement make-point like this:

(defn make-point
        ([] '(0 0))
        ([x] (list x x))
        ([x y] (list x y))
)

Each line of make-point is a different function signature with a different function body.

Here’s another example from http://clojure.org/reference/special_forms:

(defn mymax
  ([x] x)
  ([x y] (if (> x y) x y))
  ([x y & more]
   (reduce mymax (mymax x y) more)))

Functions that Return Functions¶

In Lisp, functions are values that be passed to functions, returned from functions, and assigned to variables. For example:

(defn make-adder [n]
    (fn [x] (+ n x))
)

make-adder takes a number, n, as input and returns a function that adds n to the value given to it. For example:

=> (def add4 (make-adder 4))   ;; note the use of def instead of defn

=> (add4 2)
6
=> (add4 -3)
1

You can even do things like this:

=> ((make-adder 4) 5)
9

When ((make-adder 4) 5) is evaluated, first (make-adder 4) is evaluated, which returns a function, and then this function is applied to 5.

Here’s another handy function that returns a function:

(defn negate [pred]
    (fn [x] (not (pred x)))
)

This takes a predicate, i.e. a function that takes one input and returns a boolean value, as input, and returns new predicate that is its negation. For example:

=> (def my-odd (negate even?))
#'user/my-odd
user=> (my-odd 5)
true
user=> (my-odd 6)
false

Or:

=> (keep-if (negate even?) '(1 2 3 4 5 6 7))
(1 3 5 7)

This removes all the even numbers from a list. If you instead used (negate odd?), it would remove all the odd numbers.

Memoization¶

Consider the following classic recursive function:

(defn fib [n]
    (cond
        (= n 0) 1
        (= n 1) 1
        :else (+ (fib (- n 2)) (fib (- n 1)))
    )
)

(fib n) recursively calculates the nth Fibonacci number using the traditional recurrence, i.e. \(F_0 = 1\), \(F_1 = 1\), and \(F_n = F_{n-1} + F_{n-2}\).

On the plus side, this function is very close to the mathematical definition, and so it is quick to write, short, and easy to check for correctness.

On the negative side, it is extremely inefficient because almost every call to fib makes two more calls to fib, and this results in a huge amount of redundant calculation. Try calculating, say, (fib 35), to see how slow it is.

One way to improve this function is to rewrite it using a loop. Clojure provides a number of looping constructs that you could use to do this.

But there is another interesting and general-purpose approach to speeding-up this function that is much easier to implement (once you know about it!).

First, lets rewrite the definition of fib to use plain def instead of defn:

(def fib
    (fn [n]
        (cond
            (= n 0) 1
            (= n 1) 1
            :else (+ (fib (- n 2)) (fib (- n 1)))
        )
    )
)

Now, we can speed up this version of the function by using the built-in memoize function:

(def fib
    (memoize
        (fn [n]
            (cond
                (= n 0) 1
                (= n 1) 1
                :else (+ (fib (- n 2)) (fib (- n 1)))
            )
        )
    )
)

(memoize f) returns a memoized version of f. In this case, the result is a dramatic speed-up of fib: I can calculate (fib 91) apparently instantaneously (values greater than 91 cause overflow).

Memoization does two basic things. It adds:

A hash table to fib where it stores any values that it calculates.
An if-statement to the beginning of the function so that when (fib n) is called, it first checks to see if that value of n has already been calculated. If so, it returns that value and does not call the body of the function. If n is not in the table, then it is calculated using the body of the function, and this result is stored in the table and then returned as the value of the function.

Essentially, memoization caches values. This can speed up almost any function that makes lots of function calls (such as fib). It’s the same sort of trick that high-traffic websites use to increase performance, but on the level of a function instead of an entire website.

The downside of memoization is that the storage table takes up extra space. For the fib function, the memoization table is never going to be very big because it can’t calculate values greater than (fib 91).

This is a traditional trade-off: by using more memory, we get more speed.

Aside: Memoization in Python¶

Let’s look at how memoization can be implemented in Python. Python’s syntax makes the details a little clearer, plus it provides a convenient usage syntax called a decorator.

For concreteness, lets memoize this function by hand:

def fib(n):
    if n == 0:
        return 1
    elif n == 1:
        return 1
    else:
        return fib(n-2) + fib(n-1)

Memoization works by adding a table that stores (n, f(n)) pairs. The easiest way to implement such a table in Python is to use a dictionary:

cache = {}  # an initially empty dictionary

Next, we modify the code in fib to check to see if the desired value has already been calculated:

cache = {}
def fib2(n):
    if n == 0:
        return 1
    elif n == 1:
        return 1
    else:
        if not n in cache:
            cache[n] = fib2(n-2) + fib2(n-1)
        return cache[n]

This is much faster than fib!

What’s interesting about this is that adding the memoizing does not require modifying the body of fib. This the key to writing a general-purpose memoizer that works with any function.

In Python, the best way to write a memoizer is to use a class like this:

class memoize(object):
    def __init__(self, fn):
        self.fn = fn
        self.cache = {}
    def __call__(self, *args):
        if not args in self.cache:
            self.cache[args] = self.fn(*args)
        return self.cache[args]

Don’t worry about the exact syntactic details of Python. Just notice that the code that was used to implement memoization was generalized and put into a class.

Now we can easily memoize a function by using decorator syntax, e.g.:

@memoize    # decorator
def fib(n):
    if n == 0:
        return 1
    elif n == 1:
        return 1
    else:
        return fib(n-2) + fib(n-1)

The decorator acts as if we had written fib = memoize(fib).

Closures¶

Why is Clojure called Clojure? It’s a play on the term closure, and the “j” is for Java.

So what is a closure? It’s a combination of two things: a function, and an environment of (variable, value) pairs. The function is allowed to use variables from this environment.

For example, consider this code:

(defn f [x] (+ x 1))

f is the name of a function, not a closure. But things change when we define it like this:

(defn make-adder [n]
    (fn [x] (+ n x))   ;; n's value comes from outside the lambda function
)

(def g (make-adder 1))  ;; note the use of def

g names a closure that includes both a function and a binding for the variable n. If you have just the function (fn [x] (+ n x)), then you can’t actually evaluate it because n is not specified. But, of course, in this example, n is bound to the value 1. The function plus this binding of n is a closure.

More concretely, compare this:

=> (def g1 (let [n 1] (fn [x] (+ n x))))

=> (g1 5)
6

=> (g1 9)
10

To this:

=> (def g2 (fn [x] (+ n x)))
user=> CompilerException java.lang.RuntimeException: Unable to resolve symbol: n in this context, compiling:(/tmp/form-init597735295358479020.clj:1:17)

Clojure doesn’t let us define g2 because the variable n is free, i.e. it is not bound to any value. And so how could Clojure possibly evaluate (g2 5) or (g2 9)? It can’t!

g1 is a closure. You can see that it’s not just a function, but a function along with some variable bindings that it needs.

Any programming language that let you to define a function inside another function must deal with the issue of what do about variables the inner function uses that are not defined inside it. Closures are a natural and useful solution to this problem. Languages that have lambda functions — such as Go, Java, C++, Python, JavaScript, etc. — also support closures.

In Go, for example, you can do this:

package main

import "fmt"

type intFunc func(int) int

func makeAdder(n int) intFunc {
        result := func(x int) int { return x + n }
        return result
}

func main() {
        add2 := makeAdder(2)   // add2 is a closure

        fmt.Println(add2(5))
}

add2 is a closure. The n inside the result function is bound to the value of n passed to makeAdder, and it stays bound for the life of the result function. So even after makeAdder finishes executing, the n in the result is still there and can be used as long as add2 exists.

As an aside, notice how the typing information that Go requires makes the code longer, and more cluttered, than the same code in Clojure. Clojure is dynamically typed, and so doesn’t check for type information until run time.

Composing Functions¶

Another interesting feature of Lisp is the ability to easily compose functions. Recall how function composition works in mathematics. Given \(f(x) = x^2\) and \(g(x) = 2x + 1\), the composition of \(f\) and \(g\), denoted \(f \circ g\), can be calculated as \(f(g(x)) = g(x)^2 = (2x + 1)^2 = 4x^2 + 4x + 1\).

In Clojure, we can write these functions like this:

(defn f [x] (* x x))
(defn g [x] (+ (* 2 x) 1))

For example:

user=> (f 2)
4
user=> (g 2)
5
user=> (f (g 2))
25

One way to compose functions is to write a function called compose like this:

(defn compose [f g] (fn [x] (f (g x))))

compose takes two functions, f and g as input, and returns the function (fn [x] (f (g x))) as output.

Now we can write the composition of f and g like this:

(def fog (compose f g))

(The built-in Clojure function comp does the same thing.)

And it works:

user=> (fog 2)
25

The twice function takes a function, f, as input and returns f composed with itself, i.e. \(f \circ f\):

(defn twice [f] (compose f f))

For instance:

(def garnish (twice (fn [x] (cons 'cheese x))))

=> (garnish '(1 2 3))
(cheese cheese 1 2 3)

We can generalize this as follows:

(defn compose-n [f n]
    (cond
        (= n 1) f
        :else (compose f (compose-n f (- n 1)))
    )
)

In a sense, this returns a new function that is f to the power of n, where function composition is being used instead of multiplication.

Partial Function Application¶

Composition is a way of creating a new function from a pair of other functions. Another interesting way to create a function from a function is to use partial evaluation. For example:

(defn add [x y] (+ x y))
(def add5 (partial add 5))  ;; note the use of def (not defn)

=> (add5 6)
11

The add function takes two inputs, x and y, while add5 takes only one input.

The built-in function partial takes a function f with n inputs as the its first argument, followed by fewer than n arguments. It then returns a new function that is the same as the passed-in function, except the parameters passed to partial are bound to the corresponding formal parameters of f.

Here’s another example:

(def hatter (partial cons 'hat))  ;; note the use of def (not defn)

=> (hatter '(a b c))
(hat a b c)

And one more:

(def cheese-it (partial compose-n (fn [x] (cons 'cheese x))))

=> ((cheese-it 5) '(1 2 3))
(cheese cheese cheese cheese cheese 1 2 3)

Example: Making Selector Functions¶

As an example of functional programming, lets see how we might go about creating a function that makes selector functions. Here are some examples of selector functions (first and second are already defined in Clojure):

(defn third [seq]
    (first (rest (rest seq)))
)

(defn fourth [seq]
    (first (rest (rest (rest seq))))
)

For example:

user=> (third '(1 2 3 4 5))
3
user=> (fourth '(1 2 3 4 5))
4

It’s not hard to see the pattern for how these functions are defined: to get the nth element of a sequence, call rest on it n-1 times (thus removing the first n-1 elements), and then call first.

With some thought, you can come up with a function like this:

(defn make-selector1 [n]
    (compose first (compose-n rest (- n 1)))
)

make-selector1 returns a function that can be used as a selector. So we could use it like this:

(defn third [seq]
    (make-selector1 3)
)

(defn fourth [seq]
    (make-selector1 4)
)

(defn fifteenth [seq]
    (make-selector1 15)
)

Another interesting way to create a selector-making function is to used the built-in compose function in Clojure called comp. It’s like the compose function we defined, but it lets you compose more than two functions at once, e.g.:

user=> (defn add1 [x] (+ 1 x))
user=> (def t (comp add1 add1 add1))
user=> (t 5)
8

This saves us from typing some extra parentheses (always a nice thing in Lisp!), e.g.:

(defn third [seq]
    ((comp first rest rest) seq)
)

(defn fourth [seq]
    ((comp first rest rest rest) seq)
)

The pattern of how these functions work is now even clearer. Using the built- in function replicate, we could do this:

user=> (replicate 5 rest)
(#<core$rest clojure.core$rest@18dec5c> #<core$rest clojure.core$rest@18dec5c>
 #<core$rest clojure.core$rest@18dec5c> #<core$rest clojure.core$rest@18dec5c>
 #<core$rest clojure.core$rest@18dec5c>)

(replicate n x) returns a list containing n copies of x. The example here shows a list containing 5 copies of the rest function.

With this, and the apply function (explained below), we can write this function:

(defn make-selector2 [n]
    (apply comp
           (cons first (replicate (- n 1) rest))
    )
)

(def third (make-selector2 3))
(def fourth (make-selector2 4))
(def fifteenth (make-selector2 15))

Look at the body of make-selector2. The expression (cons first (replicate (- n 1) rest)) creates a list whose first element is the function first, and whose remaining n-1 elements are the function rest.

What we want to do is compose all those functions together using comp. You might first guess that you could do it like this:

(cons comp
      (cons first (replicate (- n 1) rest))
)

But this returns a list whose first element is the function comp, and whose remaining elements are as above (first, followed by many copies of rest). It is just a list, and does not get evaluated.

So instead of cons, Clojure provides the built-in apply function that takes a function and a list of parameters as input, and returns the result of applying the function to those parameters. For example:

=> (defn plus [x y] (+ x y))
=> (apply plus '(2 3))
5

That’s just what we need here because the arguments to comp are constructed as a list:

(apply comp
       (cons first (replicate (- n 1) rest))
)

Macros¶

Macros are one of the most interesting and powerful features in Lisp. Intuitively, you can think of them as programs that make other programs. This is in contrast to functions, which are programs that make values. Macros let you do things like create our own def-like environments, and our own control structures — things that are usually impossible to do languages like Java or C++.

C and C++ also have macros, but they’re less-powerful than Clojure macros. C/C++ macros are just string-based search-and-replace done before compilation, and they are not an intrinsic part of the language. You could even replace the default C/C++ macro processor with a different one, if you like. Clojure macros, however, are a fundamental feature of the language that let you decide when an expression is evaluated. They work on Clojure expressions, not just strings.

To see why you might want to make a macro, consider this example:

=> (def x 3)
#'user/x
=> x
3

def binds the value 3 to x. But here’s an interesting question: is def a function? No, it turns out that it’s not. If def were a function, then you’d get an error saying x is undefined, just like this:

;; assume x is not defined
=> (+ 2 x)
java.lang.Exception: Unable to resolve symbol: x in this context (NO_SOURCE_FILE:1)

That’s because the rule for calling (f x y) in Clojure is to first evaluate x and y, and then pass those results to f. So (def x 3) wouldn’t work if it were a function because Clojure would try to evaluate x. But x doesn’t exist yet.

Clojure calls expressions like (def x 3) special forms to distinguish them from regular functions. Other special forms are defn, and, let, cond, etc. In most languages, special forms are hard-coded into the language, and you can’t add your own. But in Lisp, macros let you do just that (among other things!).

Here’s another example. Suppose you want to create your if-then-else control structure. You can’t do that with a function:

;; BAD!: can't create a control structure with a function
;; a and b are both evaluated before they are passed to if-then-else,
;; but we only ever want to evaluate one of then
(defn if-then-else [condition a b]
        (if condition a b)
)

user=> (if-then-else true (println "yes") (println "no"))
yes
no
nil

user=> (if-then-else (= 1 2) (println "yes") (println "no"))
yes
no
nil

The problem is that the if-part and the else-part are always both evaluated when if-then-else is called. That’s wrong. For an if-else statement, we only ever want to evaluate one of the a or b, and never both.

The problem is that the function parameters are evaluated before being passed to the function. But with expressions like def and if-then-else, we want the parameters to be evaluated after they are passed in.

This is what macros do: they let us control when parameters are evaluated. The essential difference between a macro and a function is that a macro is passed its parameters unevaluated, while functions are passed their parameters evaluated. So we can use macros to write our own special forms, e.g. def-like environments, or control structures like if-then-else.

This macro correctly implements if-then-else:

(defmacro if-then-else [condition a b]
        (list 'if condition a b)
)

The trick with a macro is that its body creates an expression that, when evaluated, does what we want it to do. Macros run in a two-step process: they first create an expression, and then they evaluate the expression. The creation step is called expansion, and it is this step that functions don’t do.

Use the macroexpand function to see what a macro expands to:

=> (macroexpand '(if-then-else false (println "Hi!") (println "Bye!")))
(if false (println "Hi!") (println "Bye!"))

The result here shows exactly what Clojure will evaluate when you call (if- then-else false (println "Hi!") (println "Bye!")).

In if-then-else, a and b are not evaluated when the macro is expanded: they are just put onto a list as-is. When the macro is evaluated, so too are a and b: they are evaluated after they are passed to if-then-else.

Clojure does not have a nand operator, so lets write a macro for it. The expression (nand a b) should be true just when a and b are not both true. In other words, (nand a b) always has the same value as (not (and a b)):

(defmacro nand [a b]
        (list 'not (list 'and a b))
)

r=> (nand true false)
true

=> (nand (< 1 2) (< 3 4))
false

It seems to work. Use macroexpand to check that the expression it evaluates is correct:

=> (macroexpand '(nand true false))
(not (and true false))

This tells us that when we evaluate (nand true false), it will first expand into (not (and true false)), and then this expanded expression will be evaluated. Similarly:

=> (macroexpand '(nand (> 1 2) (> 3 4)))
(not (and (> 1 2) (> 3 4)))

macroexpand is essential for debugging macros. It shows you exactly what expression gets evaluated.

Notice that we used list twice inside nand. This version would not work:

(defmacro nand-bad1 [a b]
        (list 'not '(and a b))
)

=> (nand-bad1 true false)
java.lang.Exception: Unable to resolve symbol: a in this context (NO_SOURCE_FILE:106)

=> (macroexpand '(nand-bad1 true false))
(not (and a b))

macroexpand shows why this is wrong. When (nand-bad1 true false) is evaluated, it’s expanded to (not (and a b)), and then (not (and a b)) is evaluated. The error is that a is not defined here. Indeed, we don’t want a here, we want the expression a refers to, which is true in this case.

A def-like Macro¶

Here’s an example of a def-like macro. Suppose we’re writing a program that involves point objects defined like this:

(defn make-point-obj [x y]
    (list 'point (list 'x x) (list 'y y))
)

=> (make-point-obj 4 9)
(point (x 4) (y 9))

=> (make-point-obj -6 18)
(point (x -6) (y 18))

You can use def to assign a point to a global variable, e.g.:

=> (def origin (make-point-obj 0 0))
#'user/origin

=> origin
(point (x 0) (y 0))

But you could also write a special macro for defining points like this:

(defmacro defpoint [vble a b]
    (list 'def vble (list 'make-point-obj a b))
)

Now you can write code like this:

=> (defpoint origin 100 50)
#'user/origin

=> origin
(point (x 100) (y 50))

Writing (defpoint origin 100 50) is a little more readable than (def origin (make-point-obj 0 0)).

You could also allow for the possibility of defining a point without yet knowing exactly what value you want to give it:

(defmacro defpoint
    ([v]
        (list 'def v (list 'make-point-obj 0 0))
    )
    ([v a b]
        (list 'def v (list 'make-point-obj a b))
    )
)

=> (defpoint q)
#'user/q

=> q
(point (x 0) (y 0))

Macro Pitfalls¶

Macros are trickier to use than functions because of their extra expansion step. They can often suffer from subtle bugs if not implemented carefully. We’ll discuss two basic problems you can run into: variable capture, and double evaluation. Keep in mind that these are not the only problems macros can suffer from — there are other kinds of problems with macros that can arise!

Variable Capture¶

Macros can, if they need to, define variables. For instance, consider this macro:

(defmacro add1-bad [n]
    (list
        'let ['incr 1]
        (list '+ n 'incr)
    )
)

=> (add1-bad 5)
6

=> (let [a 2] (add1-bad a))
3

It’s a contrived example, but it shows a subtle problem:

=> (let [incr 2] (add1-bad incr))
2

For some strange reason, the programmer decided use a variable named incr in this expression, and this breaks the macro: 2 is the wrong result! It should be 3. The problem here is that, unknown to the programmer, add1-bad uses incr as an internal variable. So the macro expands to this:

(let [incr 1] (+ incr incr))

Clearly, the problem is that the incr variable that gets passed into add1-bad has been captured by the local incr in the add1-bad let expression.

The way to fix this problem is to make sure that the local variables in a macro never have the same name as any variable that might be passed into it. However, for any name we change incr to, it’s possible that the programmer could pass that name in!

The standard Lisp solution to the problem is to generate a unique variable name using the gensym function:

=> (gensym)
G__2728
=> (gensym)
G__2731
=> (gensym)
G__2734

Every time (gensym) is evaluated, it returns a new name that it has not returned before.

Now we can replace the variable incr with a name generated by gensym:

(defmacro add1-good1 [n]
    (let [vbl (gensym)]       ;; note that this outer let is not quoted
        (list
            'let [vbl 1]      ;; vbl is not quoted
            (list '+ n vbl)
        )
    )
)

=> (let [incr 1] (add1-good1 incr))
2

=> (let [a 1] (add1-good1 a))
2

This trick is so common in macros, that Clojure provides some syntactic help to simplify it a bit:

(defmacro add1-good2 [n]
    (list
        'let ['incr# 1]
        (list '+ n 'incr#)
    )
)

Putting a # at the end of a symbol is called an auto-gensysm. The name incr# is guaranteed to be unique.

Double Evaluation¶

Consider this function and macro:

(defn dbg [x]
    (println x)
    x
)

=> (dbg 4)
4
4


(defmacro square [n]
    (list '* n n)
)

=> (square 4)
16

The dbg function first prints whatever you pass to it, and then returns it as a value. As its name suggests, it’s meant for debugging.

Using dbg, we can see that the square macro suffers from double evaluation, i.e. whatever input you pass it gets evaluated twice (because n appears two times in the macro):

=> (square (dbg 4))
4
4
16

=> (macroexpand '(square (dbg 4)))
(* (dbg 4) (dbg 4))

The correct answer (16) is returned, but you might expect that 4 should be printed only once. It certainly appears that (dbg 4) is only evaluated once, but if you read the square macro you can see that . anything you pass to it gets evaluated twice. Not only is this inefficient, it can cause serious problems for any expressions with side-effects, such as printing to the screen, writing to a file, modifying a global variable, etc.

To avoid double evaluation, you can use let to make sure the evaluation only happens once:

(defmacro square-fixed [n]
    (list
        'let ['a# n]      ;; n is only evaluated once here
        (list '* 'a# 'a#)
    )
)

=> (square-fixed (dbg 4))
4
16
user=> (macroexpand '(square-fixed (dbg 4)))
(let* [a (dbg 4)] (* 4 4))

When Should You Use Macros?¶

Some programmers believe that macros are a disaster. They say that they’re hard to write, hard to understand, and hard to debug. You shouldn’t use programming language features that are both difficult and error-prone, they say. The lack of macros in a language is a feature!

But some programmers think that the benefits of macros outweigh these costs. Yes, they say, macros are harder to write, understand, and debug as compared to regular functions, but they are also more powerful than regular functions. When use with care, they can make your programs much easier to read and understand, and more fun to use. Plus they can do things you can’t possibly do with functions, like create new control structures.

A general rule of thumb is to use macros sparingly. If a function will do, then use a function. Use macros only when functions won’t do the job, or when they provide a clear and obvious benefit, such as significantly improved syntax.

Finally, we should mention that if you use macros a lot, using so many list and ' tokens soon becomes tiresome, and this leads to errors. So Clojure provides some extra syntax designed specifically for helping with macros. For example, a better way to write the if-then-else macro is like this:

=> (defmacro if-then-else [condition a b] `(if ~condition ~a ~b))
#'user/if-then-else

We’ve used a backquote instead of a regular (forward) quote in this function. A backquote in Clojure is like a regular quote, but it lets us use ~ inside the list to turn off quoting. So if is a symbol that stand for itself, while ~condition, ~a and ~b are replaced by what those variables refer to.

Although we haven’t used it here, the backquote also lets use the ~@ operator for splicing lists, e.g.:

=> (def vars '(x y z))

    => `(a b ~@vars c)
    (user/a user/b x y z user/c)

You can also see here that the backquote has prepended user/ in front of a, b, and c. user/ refers to the namespace that these variables are in, and this helps to prevent unintended variable capture.

Admittedly, the benefit of using a backquote in if-then-else is small. But if you write a lot of macros, especially larger and more complex ones, backquotes can improve readability quite a bit.

Making Your cons, first, and rest¶

One of the many interesting things about Lisp is that it’s possible to write your own versions of the standard cons, first, and rest functions as follows:

(defn mycons [x y] (fn [m] (m x y)))

(defn myfirst [z] (z (fn [p q] p)))

(defn myrest [z] (z (fn [p q] q)))

The is known as the Church encoding (after Alonzo Church). It essentially defines a list data type in Clojure using only functions. For instance:

=> (def lst (mycons 1 (mycons 2 (mycons 3 ()))))
#'user/lst
=> lst
#object[user$mycons$fn__1212 0x5c2ec4 "user$mycons$fn__1212@5c2ec4"]
=> (myfirst lst)
1
=> (myfirst (myrest lst))
2

The trick is that the list (3) can be represented by the expression (mycons 3 ()). This is equivalent to the function (fn [m] (m 3 ())), where 3 and () are stored within the function like a pair of values. When myfirst is called with this function as the input, m gets replaced by a function that takes two inputs, p and q, and returns p. myrest is similar, except its function returns q instead.

This is not an efficient way to represent lists in practice, but it is useful in theory. It shows that functions like cons, first, and rest need not primitives, but instead can be defined in terms of functions.

It’s also possible to represent integers and arithmetic in this purely functional way. Again, this is not efficient, but it is quite important in theory.

Clojure Lists¶

Lists in Clojure are implemented as traditional singly-linked lists. Each item in a list is represented by a cons cell, i.e. a record that contains two pointers, one the first element of the list, and one to the rest of the list.

Because of this implementation, cons, first, and rest are all \(O(1)\) functions. Functions like append or last are \(O(n)\).

Interestingly, and usefully, Clojure lists are normally immutable, meaning that once you create a list you can’t change it. A consequence of this is that lists can easily share parts, e.g.:

(def A '(1 2 3 4))
(def B (rest A))

Here, B is essentially a pointer to the <https://en.wikipedia.org/wiki/Cons>`_ for 2. No copy of the elements in A needs to be made — B and A share a sublist.

Another useful consequence is list immutability is that concurrent programming with them tends to be easier because processes never need to worry about lists changing. Concurrent processes can only read a list, never write it.

Clojure Vectors¶

We’ve been using lists as the main data type in Clojure. But Clojure also supports vectors and maps, and in this section we will take a brief look the unusual way that Clojure implements its vectors.

All programming languages have some kind of array, or vector, for storing a sequence of items. For example, in Java, you could create an array like this:

int[] nums = {4, 5, 6, 7};
System.out.println(nums[3]);  // prints 7

Internally, Java implements nums as a contiguous sequence of ints. Thus, accessing an element by index takes \(O(1)\) time, no matter how long the array is.

In general, this is how most programming languages implement arrays or vectors. One other detail is whether or not the size of the array can change after it’s created. In languages like Java, C/C++, and Go, once you create an array its length never changes. We call this a fixed-size, or static array.

But in other languages, such as sequences in Python, or vectors in C++, are dynamic, i.e. their size can grow or shrink as needed. For instance, this is common in Python:

>>> nums = []                 # Python code
>>> nums.append(5)
>>> nums.append(6)
>>> nums.append(7)
>>> nums
[5, 6, 7]

Here, nums grows in size as needed. Internally, Python implements nums as a contiguous sequence of objects, and append has some rules for deciding how much memory the underlying array should take up. Dynamic arrays like this are responsible for managing their own memory, which can be very convenient for many sorts of applications. But the downside is that nums might use more memory than it needs to store its current numbers in order to make future calls to append faster. This is a common trade-off: use more memory to make a program faster.

With that it mind, let’s look at Clojure vectors, i.e. it’s version of arrays. Clojure vector literals use square brackets, e.g.:

user=> (def nums [4 5 6 7])
#'user/nums
user=> nums
[4 5 6 7]
user=> (nums 3)    ;; access the 3rd element of nums

Notice that no extra syntax is used to access elements. To get the number at location i, you can just write (nums i).

A basic fact about Clojure vectors are that they are persistent, which means that any time you modify or update them, you are creating a new vector. Think about this for a moment, and you’ll realize that if Clojure vectors were implemented as a contiguous sequence as in Java or C/C++, then every time you change a vector you’d need to make a copy of it. That would be prohibitively expensive in both time and memory.

So Clojure implements its vectors in an usual way: they’re a kind of tree. While we don’t have time to go into the details here (see this blog post if you are interested), the upshot is that Clojure vectors have nearly the same good performance characteristics as C/C++-style arrays, e.g. almost \(O(1)\) time for appending an element to a vector, and almost \(O(1)\) time for modifying an element (assuming you know its index position).

We say almost \(O(1)\) because, with a tree, it’s actually \(O(log n)\). But, Clojure knows that vector performance is important, and so uses a number of optimizations and tricks to get the performance to be pretty nearly \(O(1)\) in most practical uses.

Also, Clojure vector use more memory than a C/C++-style array because of the pointers it uses to keep track of all the nodes.

The reason Clojure uses this implementation of vectors is to have vectors that are persistent, i.e. that don’t change. This is very useful for concurrent/parallel programming, where one thread of control might modify a vector being used by a different thread of control. In C/C++, the array must be locked, or somehow specially managed, to avoid race conditions. But in Clojure, an array never changes, and so you don’t need to worry about some other thread over-writing a shared vector.

Clojure isn’t the only language with an usual implementation of arrays. Lua, for instance, implements arrays (and lists!) as a table of key as (key, value) pairs. Tables like this could be implemented as hash tables, or trees.

Extra: Running a Clojure App with Leiningen¶

Now lets look at a very neat tool that Clojure provides in its large set of library functions: unification. Unification is an algorithm that takes two lists as input, either of which may contain variables, and, if possible, returns a list of assignments to the variables that makes the two list the same. This turns out to be a surprisingly powerful and useful tool.

Clojure provides a unification library called core.unify. The easiest way to get access to it is to first install the Leiningen package management tool, and then to create a new Clojure “app” at the command-line like this:

$ lein new app prop-eval

This creates a new folder called prop-eval that contains various configuration files, including the file project.clj. Inside project.clj is where we put the requirement to use core.unify:

(defproject app "1.0.0-SNAPSHOT"
  :description "FIXME: write description"
  :dependencies [[org.clojure/clojure "1.4.0"]
                 [org.clojure/core.unify "0.5.5"]   ;; core.unify dependency
                ])

The dependency information [org.clojure/core.unify "0.5.5"] was cut-and- pasted directly from the core.unify page.

Note also that the version of Clojure is listed as a dependency here. This makes it easy to use whatever version of Clojure you want.

To use core.unfiy, we now run Clojure through Leiningen. Run lein repl in the same folder as project.clj:

$ lein repl
... text indicating libraries are being downloaded ...

REPL started; server listening on localhost port 9347
user=>

This works just like ordinary Clojure, but now the dependencies in project.clj are satisfied, meaning we can import core.unify like this:

user=> (use 'clojure.core.unify)
nil

Unification with core.unify¶

Assuming you have set up your Clojure app as in the previous section, we can now write code either in the REPL launched by lein repl, or inside the file prop-eval/src/app/core.clj.

Here’s how unify works:

user=> (use 'clojure.core.unify)
nil

user=> (unify '?x 5)
{?x 5}

The input to unify is always two expressions. In this case, the first expression is '?x, and the second expression is 5. The ?x in is a variable — unify treats all symbols starting with ? as variables.

What unify does is try to find an assignment of values to the variables in the two expressions that make the expressions identical. In this case the answer is easy: setting ?x to 5 unifies the expressions.

If unify is able to unify the expressions, then it returns a Clojure map showing the assignment of values to variables that make the expressions the same.

Here’s another example:

user=> (unify '(?x and ?y) '(f and t))
{?y t, ?x f}

This says that if you replace ?y with t and ?x with f, the two expressions passed to unify will be identical.

In this example, two variables unify with each other:

user=> (unify '(?x 3 ?y) '(2 ?z ?w))
{?y ?w, ?z 3, ?x 2}

The result says that ?y and ?w should have the same value.

Sometimes expressions can’t be unified. For example:

user=> (unify '(?x) '(a b))
nil

No matter what you assign to ?x, it cannot be made to be equal to '(a b). So we say the unification has failed.

Unification turns out to be very useful for pattern matching. For example, consider the is-and? function from above:

;; (expr and expr)
(defn is-and? [e]
   (and (list? e) (= 3 (count e)) (= 'and (second e))))

We could re-write it using unification like this:

(defn is-and? [e]
  (not (nil? (unify e '(?x and ?y))))
)

This can be improved a little bit by writing a helper function called match?:

(defn match? [e1 e2]
    (not (nil? (unify e1 e2)))
)

Now is-and? becomes this:

(defn is-and? [e] (match? e '(?x and ?y)))

This is clearer and simpler than the original is-and?. It’s now easy to write the other is- functions in the same style:

(defn is-and? [e] (match? e '(?x and ?y)))
(defn is-or? [e] (match? e '(?x or ?y)))
(defn is-nand? [e] (match? e '(?x nand ?y)))
(defn is-not? [e] (match? e '(not ?x)))

As this shows, unification can be quite useful in practice when you want to do pattern matching.