More About Clojure¶
Clojure is a powerful language packed with many interesting ideas. Even if you never end up using Clojure in practice, it’s instructive to examine some of its many interesting feature.
apply and eval¶
The apply
and eval
functions are used to evaluate lists. Here is the
usual way of evaluating expressions:
=> (* 1 2 3)
6
=> (+ 1 2 3)
6
=> (- 1 2 3)
-4
An equivalent way is to use apply
like this:
=> (apply * '(1 2 3))
6
=> (apply + '(1 2 3))
6
=> (apply - '(1 2 3))
-4
As you can see, (apply f seq)
takes a function f
and evaluates it
using the items in seq
as its input. It’s as if f
is “cons-ed” onto
seq
, and then that expression is evaluated.
The eval
function takes an entire list an evaluates it:
=> (eval '(* 1 2 3))
6
=> (eval '(+ 1 2 3))
6
=> (eval '(- 1 2 3))
-4
eval
is a very powerful function: it is essentially Clojure implemented
in Clojure! For example, you can easily implement your own version of
apply
using eval
:
(defn my-apply [f seq]
(eval (cons f seq))
)
Rest Args¶
Sometimes it’s convenient to pass an unspecified number of inputs to a
function. For example, it’s nice to have a sum
function that works like
this:
=> (sum 1 2 3)
6
=> (sum 6 7 8 9)
30
=> (sum 6)
6
=> (sum)
0
This cuts down on the number parentheses and '
marks, which makes the code
easier to read.
In Clojure, you can implement sum
using rest arguments:
(defn sum [& nums]
(apply + nums)
)
Here, nums
is the sequence of all the arguments passed to sum
— the
&
tells Clojure that 0 or more inputs are expected.
Argument Destructuring¶
Consider the following function that adds two 2-dimensional points:
(defn add-points1 [p1 p2]
(let [x1 (first p1)
y1 (second p1)
x2 (first p2)
y2 (second p2)
]
(list (+ x1 x2) (+ y1 y2))
)
)
=> (add-points1 '(1 2) '(3 4))
(4 6)
Most of the function is devoted to extracting that data out of p1
and
p2
. The code isn’t difficult, but it is tedious to write, and thus error-
prone (are you sure you got all the 1s and 2s right?).
So Clojure provides a nice feature called destructuring that can help you skip a lot of this sort of data extraction:
(defn add-points2 [[x1 y1] [x2 y2]] ;; with destructuring
(list (+ x1 x2) (+ y1 y2))
)
This is shorter and clearer. Clojure automatically assigns the correspond
values to x1
, y1
, x2
, and y2
.
However, it’s still up to you, the programmer, to pass in correctly formatted data:
=> (add-points2 '(1 2 3) '(3 4 5))
(4 6)
=> (add-points2 '(1) '(3 4 5))
java.lang.NullPointerException (NO_SOURCE_FILE:0)
Multiple Function Signatures¶
Sometimes you want a function to take different numbers, or kinds, of inputs. For example:
=> (make-point)
(0 0)
=> (make-point 8)
(8 8)
=> (make-point 4 5)
(4 5)
You can implement make-point
like this:
(defn make-point
([] '(0 0))
([x] (list x x))
([x y] (list x y))
)
Each line of make-point
is a different function signature with a different
function body.
Here’s another example from http://clojure.org/reference/special_forms:
(defn mymax
([x] x)
([x y] (if (> x y) x y))
([x y & more]
(reduce mymax (mymax x y) more)))
Functions that Return Functions¶
In Lisp, functions are values that be passed to functions, returned from functions, and assigned to variables. For example:
(defn make-adder [n]
(fn [x] (+ n x))
)
make-adder
takes a number, n
, as input and returns a function that
adds n
to the value given to it. For example:
=> (def add4 (make-adder 4)) ;; note the use of def instead of defn
=> (add4 2)
6
=> (add4 -3)
1
You can even do things like this:
=> ((make-adder 4) 5)
9
When ((make-adder 4) 5)
is evaluated, first (make-adder 4)
is
evaluated, which returns a function, and then this function is applied to 5.
Here’s another handy function that returns a function:
(defn negate [pred]
(fn [x] (not (pred x)))
)
This takes a predicate, i.e. a function that takes one input and returns a boolean value, as input, and returns new predicate that is its negation. For example:
=> (def my-odd (negate even?))
#'user/my-odd
user=> (my-odd 5)
true
user=> (my-odd 6)
false
Or:
=> (keep-if (negate even?) '(1 2 3 4 5 6 7))
(1 3 5 7)
This removes all the even numbers from a list. If you instead used (negate
odd?)
, it would remove all the odd numbers.
Memoization¶
Consider the following classic recursive function:
(defn fib [n]
(cond
(= n 0) 1
(= n 1) 1
:else (+ (fib (- n 2)) (fib (- n 1)))
)
)
(fib n)
recursively calculates the nth Fibonacci number using the
traditional recurrence, i.e. \(F_0 = 1\), \(F_1 = 1\), and
\(F_n = F_{n-1} + F_{n-2}\).
On the plus side, this function is very close to the mathematical definition, and so it is quick to write, short, and easy to check for correctness.
On the negative side, it is extremely inefficient because almost every call to
fib
makes two more calls to fib
, and this results in a huge amount of
redundant calculation. Try calculating, say, (fib 35)
, to see how slow it
is.
One way to improve this function is to rewrite it using a loop. Clojure provides a number of looping constructs that you could use to do this.
But there is another interesting and general-purpose approach to speeding-up this function that is much easier to implement (once you know about it!).
First, lets rewrite the definition of fib
to use plain def
instead of
defn
:
(def fib
(fn [n]
(cond
(= n 0) 1
(= n 1) 1
:else (+ (fib (- n 2)) (fib (- n 1)))
)
)
)
Now, we can speed up this version of the function by using the built-in
memoize
function:
(def fib
(memoize
(fn [n]
(cond
(= n 0) 1
(= n 1) 1
:else (+ (fib (- n 2)) (fib (- n 1)))
)
)
)
)
(memoize f)
returns a memoized version of f
. In this case, the
result is a dramatic speed-up of fib
: I can calculate (fib 91)
apparently instantaneously (values greater than 91 cause overflow).
Memoization does two basic things. It adds:
- A hash table to
fib
where it stores any values that it calculates. - An if-statement to the beginning of the function so that when
(fib n)
is called, it first checks to see if that value ofn
has already been calculated. If so, it returns that value and does not call the body of the function. Ifn
is not in the table, then it is calculated using the body of the function, and this result is stored in the table and then returned as the value of the function.
Essentially, memoization caches values. This can speed up almost any function
that makes lots of function calls (such as fib
). It’s the same sort of
trick that high-traffic websites use to increase performance, but on the level
of a function instead of an entire website.
The downside of memoization is that the storage table takes up extra space.
For the fib
function, the memoization table is never going to be very big
because it can’t calculate values greater than (fib 91)
.
This is a traditional trade-off: by using more memory, we get more speed.
Aside: Memoization in Python¶
Let’s look at how memoization can be implemented in Python. Python’s syntax makes the details a little clearer, plus it provides a convenient usage syntax called a decorator.
For concreteness, lets memoize this function by hand:
def fib(n):
if n == 0:
return 1
elif n == 1:
return 1
else:
return fib(n-2) + fib(n-1)
Memoization works by adding a table that stores (n
, f(n)
) pairs. The
easiest way to implement such a table in Python is to use a dictionary:
cache = {} # an initially empty dictionary
Next, we modify the code in fib
to check to see if the desired value has
already been calculated:
cache = {}
def fib2(n):
if n == 0:
return 1
elif n == 1:
return 1
else:
if not n in cache:
cache[n] = fib2(n-2) + fib2(n-1)
return cache[n]
This is much faster than fib
!
What’s interesting about this is that adding the memoizing does not require
modifying the body of fib
. This the key to writing a general-purpose
memoizer that works with any function.
In Python, the best way to write a memoizer is to use a class like this:
class memoize(object):
def __init__(self, fn):
self.fn = fn
self.cache = {}
def __call__(self, *args):
if not args in self.cache:
self.cache[args] = self.fn(*args)
return self.cache[args]
Don’t worry about the exact syntactic details of Python. Just notice that the code that was used to implement memoization was generalized and put into a class.
Now we can easily memoize a function by using decorator syntax, e.g.:
@memoize # decorator
def fib(n):
if n == 0:
return 1
elif n == 1:
return 1
else:
return fib(n-2) + fib(n-1)
The decorator acts as if we had written fib = memoize(fib)
.
Closures¶
Why is Clojure called Clojure? It’s a play on the term closure, and the “j” is for Java.
So what is a closure? It’s a combination of two things: a function, and an environment of (variable, value) pairs. The function is allowed to use variables from this environment.
For example, consider this code:
(defn f [x] (+ x 1))
f
is the name of a function, not a closure. But things change when we
define it like this:
(defn make-adder [n]
(fn [x] (+ n x)) ;; n's value comes from outside the lambda function
)
(def g (make-adder 1)) ;; note the use of def
g
names a closure that includes both a function and a binding for the
variable n
. If you have just the function (fn [x] (+ n x))
, then you
can’t actually evaluate it because n
is not specified. But, of course, in
this example, n
is bound to the value 1. The function plus this binding of
n
is a closure.
More concretely, compare this:
=> (def g1 (let [n 1] (fn [x] (+ n x))))
=> (g1 5)
6
=> (g1 9)
10
To this:
=> (def g2 (fn [x] (+ n x)))
user=> CompilerException java.lang.RuntimeException: Unable to resolve symbol: n in this context, compiling:(/tmp/form-init597735295358479020.clj:1:17)
Clojure doesn’t let us define g2
because the variable n
is free, i.e.
it is not bound to any value. And so how could Clojure possibly evaluate
(g2 5)
or (g2 9)
? It can’t!
g1
is a closure. You can see that it’s not just a function, but a
function along with some variable bindings that it needs.
Any programming language that let you to define a function inside another function must deal with the issue of what do about variables the inner function uses that are not defined inside it. Closures are a natural and useful solution to this problem. Languages that have lambda functions — such as Go, Java, C++, Python, JavaScript, etc. — also support closures.
In Go, for example, you can do this:
package main
import "fmt"
type intFunc func(int) int
func makeAdder(n int) intFunc {
result := func(x int) int { return x + n }
return result
}
func main() {
add2 := makeAdder(2) // add2 is a closure
fmt.Println(add2(5))
}
add2
is a closure. The n
inside the result function is bound to the
value of n
passed to makeAdder
, and it stays bound for the life of the
result
function. So even after makeAdder
finishes executing, the n
in the result
is still there and can be used as long as add2
exists.
As an aside, notice how the typing information that Go requires makes the code longer, and more cluttered, than the same code in Clojure. Clojure is dynamically typed, and so doesn’t check for type information until run time.
Composing Functions¶
Another interesting feature of Lisp is the ability to easily compose functions. Recall how function composition works in mathematics. Given \(f(x) = x^2\) and \(g(x) = 2x + 1\), the composition of \(f\) and \(g\), denoted \(f \circ g\), can be calculated as \(f(g(x)) = g(x)^2 = (2x + 1)^2 = 4x^2 + 4x + 1\).
In Clojure, we can write these functions like this:
(defn f [x] (* x x))
(defn g [x] (+ (* 2 x) 1))
For example:
user=> (f 2)
4
user=> (g 2)
5
user=> (f (g 2))
25
One way to compose functions is to write a function called compose
like
this:
(defn compose [f g] (fn [x] (f (g x))))
compose
takes two functions, f
and g
as input, and returns the
function (fn [x] (f (g x)))
as output.
Now we can write the composition of f
and g
like this:
(def fog (compose f g))
(The built-in Clojure function comp
does the same thing.)
And it works:
user=> (fog 2)
25
The twice
function takes a function, f
, as input and returns f
composed with itself, i.e. \(f \circ f\):
(defn twice [f] (compose f f))
For instance:
(def garnish (twice (fn [x] (cons 'cheese x))))
=> (garnish '(1 2 3))
(cheese cheese 1 2 3)
We can generalize this as follows:
(defn compose-n [f n]
(cond
(= n 1) f
:else (compose f (compose-n f (- n 1)))
)
)
In a sense, this returns a new function that is f
to the power of n
,
where function composition is being used instead of multiplication.
Partial Function Application¶
Composition is a way of creating a new function from a pair of other functions. Another interesting way to create a function from a function is to use partial evaluation. For example:
(defn add [x y] (+ x y))
(def add5 (partial add 5)) ;; note the use of def (not defn)
=> (add5 6)
11
The add
function takes two inputs, x
and y
, while add5
takes
only one input.
The built-in function partial
takes a function f
with n inputs as the
its first argument, followed by fewer than n arguments. It then returns a new
function that is the same as the passed-in function, except the parameters
passed to partial
are bound to the corresponding formal parameters of
f
.
Here’s another example:
(def hatter (partial cons 'hat)) ;; note the use of def (not defn)
=> (hatter '(a b c))
(hat a b c)
And one more:
(def cheese-it (partial compose-n (fn [x] (cons 'cheese x))))
=> ((cheese-it 5) '(1 2 3))
(cheese cheese cheese cheese cheese 1 2 3)
Example: Making Selector Functions¶
As an example of functional programming, lets see how we might go about
creating a function that makes selector functions. Here are some examples of
selector functions (first
and second
are already defined in
Clojure):
(defn third [seq]
(first (rest (rest seq)))
)
(defn fourth [seq]
(first (rest (rest (rest seq))))
)
For example:
user=> (third '(1 2 3 4 5))
3
user=> (fourth '(1 2 3 4 5))
4
It’s not hard to see the pattern for how these functions are defined: to get
the nth element of a sequence, call rest
on it n-1 times (thus removing
the first n-1 elements), and then call first
.
With some thought, you can come up with a function like this:
(defn make-selector1 [n]
(compose first (compose-n rest (- n 1)))
)
make-selector1
returns a function that can be used as a selector. So we
could use it like this:
(defn third [seq]
(make-selector1 3)
)
(defn fourth [seq]
(make-selector1 4)
)
(defn fifteenth [seq]
(make-selector1 15)
)
Another interesting way to create a selector-making function is to used the
built-in compose function in Clojure called comp
. It’s like the
compose
function we defined, but it lets you compose more than two
functions at once, e.g.:
user=> (defn add1 [x] (+ 1 x))
user=> (def t (comp add1 add1 add1))
user=> (t 5)
8
This saves us from typing some extra parentheses (always a nice thing in Lisp!), e.g.:
(defn third [seq]
((comp first rest rest) seq)
)
(defn fourth [seq]
((comp first rest rest rest) seq)
)
The pattern of how these functions work is now even clearer. Using the built-
in function replicate
, we could do this:
user=> (replicate 5 rest)
(#<core$rest clojure.core$rest@18dec5c> #<core$rest clojure.core$rest@18dec5c>
#<core$rest clojure.core$rest@18dec5c> #<core$rest clojure.core$rest@18dec5c>
#<core$rest clojure.core$rest@18dec5c>)
(replicate n x)
returns a list containing n copies of x. The example here
shows a list containing 5 copies of the rest
function.
With this, and the apply
function (explained below), we can write this
function:
(defn make-selector2 [n]
(apply comp
(cons first (replicate (- n 1) rest))
)
)
(def third (make-selector2 3))
(def fourth (make-selector2 4))
(def fifteenth (make-selector2 15))
Look at the body of make-selector2
. The expression (cons first
(replicate (- n 1) rest))
creates a list whose first element is the function
first
, and whose remaining n-1 elements are the function rest
.
What we want to do is compose all those functions together using comp
. You
might first guess that you could do it like this:
(cons comp
(cons first (replicate (- n 1) rest))
)
But this returns a list whose first element is the function comp
, and
whose remaining elements are as above (first
, followed by many copies of
rest
). It is just a list, and does not get evaluated.
So instead of cons
, Clojure provides the built-in apply
function that
takes a function and a list of parameters as input, and returns the result of
applying the function to those parameters. For example:
=> (defn plus [x y] (+ x y))
=> (apply plus '(2 3))
5
That’s just what we need here because the arguments to comp
are
constructed as a list:
(apply comp
(cons first (replicate (- n 1) rest))
)
Macros¶
Macros are one of the most interesting and powerful features in Lisp. Intuitively, you can think of them as programs that make other programs. This is in contrast to functions, which are programs that make values. Macros let you do things like create our own def-like environments, and our own control structures — things that are usually impossible to do languages like Java or C++.
C and C++ also have macros, but they’re less-powerful than Clojure macros. C/C++ macros are just string-based search-and-replace done before compilation, and they are not an intrinsic part of the language. You could even replace the default C/C++ macro processor with a different one, if you like. Clojure macros, however, are a fundamental feature of the language that let you decide when an expression is evaluated. They work on Clojure expressions, not just strings.
To see why you might want to make a macro, consider this example:
=> (def x 3)
#'user/x
=> x
3
def
binds the value 3 to x
. But here’s an interesting question: is
def
a function? No, it turns out that it’s not. If def
were a
function, then you’d get an error saying x
is undefined, just like this:
;; assume x is not defined
=> (+ 2 x)
java.lang.Exception: Unable to resolve symbol: x in this context (NO_SOURCE_FILE:1)
That’s because the rule for calling (f x y)
in Clojure is to first
evaluate x
and y
, and then pass those results to f
. So (def x
3)
wouldn’t work if it were a function because Clojure would try to evaluate
x
. But x
doesn’t exist yet.
Clojure calls expressions like (def x 3)
special forms to distinguish
them from regular functions. Other special forms are defn
, and
,
let
, cond
, etc. In most languages, special forms are hard-coded into
the language, and you can’t add your own. But in Lisp, macros let you do just
that (among other things!).
Here’s another example. Suppose you want to create your if-then-else control structure. You can’t do that with a function:
;; BAD!: can't create a control structure with a function
;; a and b are both evaluated before they are passed to if-then-else,
;; but we only ever want to evaluate one of then
(defn if-then-else [condition a b]
(if condition a b)
)
user=> (if-then-else true (println "yes") (println "no"))
yes
no
nil
user=> (if-then-else (= 1 2) (println "yes") (println "no"))
yes
no
nil
The problem is that the if-part and the else-part are always both evaluated
when if-then-else
is called. That’s wrong. For an if-else statement, we
only ever want to evaluate one of the a
or b
, and never both.
The problem is that the function parameters are evaluated before being
passed to the function. But with expressions like def
and if-then-else
, we want the parameters to be evaluated after they are passed in.
This is what macros do: they let us control when parameters are evaluated. The
essential difference between a macro and a function is that a macro is passed
its parameters unevaluated, while functions are passed their parameters
evaluated. So we can use macros to write our own special forms, e.g. def-like
environments, or control structures like if-then-else
.
This macro correctly implements if-then-else
:
(defmacro if-then-else [condition a b]
(list 'if condition a b)
)
The trick with a macro is that its body creates an expression that, when evaluated, does what we want it to do. Macros run in a two-step process: they first create an expression, and then they evaluate the expression. The creation step is called expansion, and it is this step that functions don’t do.
Use the macroexpand
function to see what a macro expands to:
=> (macroexpand '(if-then-else false (println "Hi!") (println "Bye!")))
(if false (println "Hi!") (println "Bye!"))
The result here shows exactly what Clojure will evaluate when you call (if-
then-else false (println "Hi!") (println "Bye!"))
.
In if-then-else
, a
and b
are not evaluated when the macro is
expanded: they are just put onto a list as-is. When the macro is evaluated, so
too are a
and b
: they are evaluated after they are passed to
if-then-else
.
Clojure does not have a nand
operator, so lets write a macro for it. The
expression (nand a b)
should be true just when a
and b
are not
both true. In other words, (nand a b)
always has the same value as (not
(and a b))
:
(defmacro nand [a b]
(list 'not (list 'and a b))
)
r=> (nand true false)
true
=> (nand (< 1 2) (< 3 4))
false
It seems to work. Use macroexpand
to check that the expression it
evaluates is correct:
=> (macroexpand '(nand true false))
(not (and true false))
This tells us that when we evaluate (nand true false)
, it will first
expand into (not (and true false))
, and then this expanded expression will
be evaluated. Similarly:
=> (macroexpand '(nand (> 1 2) (> 3 4)))
(not (and (> 1 2) (> 3 4)))
macroexpand
is essential for debugging macros. It shows you exactly what
expression gets evaluated.
Notice that we used list
twice inside nand
. This version would not
work:
(defmacro nand-bad1 [a b]
(list 'not '(and a b))
)
=> (nand-bad1 true false)
java.lang.Exception: Unable to resolve symbol: a in this context (NO_SOURCE_FILE:106)
=> (macroexpand '(nand-bad1 true false))
(not (and a b))
macroexpand
shows why this is wrong. When (nand-bad1 true false)
is
evaluated, it’s expanded to (not (and a b))
, and then (not (and a b))
is evaluated. The error is that a
is not defined here. Indeed, we don’t
want a
here, we want the expression a
refers to, which is true
in
this case.
A def-like Macro¶
Here’s an example of a def-like macro. Suppose we’re writing a program that involves point objects defined like this:
(defn make-point-obj [x y]
(list 'point (list 'x x) (list 'y y))
)
=> (make-point-obj 4 9)
(point (x 4) (y 9))
=> (make-point-obj -6 18)
(point (x -6) (y 18))
You can use def
to assign a point to a global variable, e.g.:
=> (def origin (make-point-obj 0 0))
#'user/origin
=> origin
(point (x 0) (y 0))
But you could also write a special macro for defining points like this:
(defmacro defpoint [vble a b]
(list 'def vble (list 'make-point-obj a b))
)
Now you can write code like this:
=> (defpoint origin 100 50)
#'user/origin
=> origin
(point (x 100) (y 50))
Writing (defpoint origin 100 50)
is a little more readable than (def
origin (make-point-obj 0 0))
.
You could also allow for the possibility of defining a point without yet knowing exactly what value you want to give it:
(defmacro defpoint
([v]
(list 'def v (list 'make-point-obj 0 0))
)
([v a b]
(list 'def v (list 'make-point-obj a b))
)
)
=> (defpoint q)
#'user/q
=> q
(point (x 0) (y 0))
Macro Pitfalls¶
Macros are trickier to use than functions because of their extra expansion step. They can often suffer from subtle bugs if not implemented carefully. We’ll discuss two basic problems you can run into: variable capture, and double evaluation. Keep in mind that these are not the only problems macros can suffer from — there are other kinds of problems with macros that can arise!
Variable Capture¶
Macros can, if they need to, define variables. For instance, consider this macro:
(defmacro add1-bad [n]
(list
'let ['incr 1]
(list '+ n 'incr)
)
)
=> (add1-bad 5)
6
=> (let [a 2] (add1-bad a))
3
It’s a contrived example, but it shows a subtle problem:
=> (let [incr 2] (add1-bad incr))
2
For some strange reason, the programmer decided use a variable named incr
in this expression, and this breaks the macro: 2 is the wrong result! It
should be 3. The problem here is that, unknown to the programmer, add1-bad
uses incr
as an internal variable. So the macro expands to this:
(let [incr 1] (+ incr incr))
Clearly, the problem is that the incr
variable that gets passed into
add1-bad
has been captured by the local incr
in the add1-bad
let
expression.
The way to fix this problem is to make sure that the local variables in a
macro never have the same name as any variable that might be passed into it.
However, for any name we change incr
to, it’s possible that the programmer
could pass that name in!
The standard Lisp solution to the problem is to generate a unique variable
name using the gensym
function:
=> (gensym)
G__2728
=> (gensym)
G__2731
=> (gensym)
G__2734
Every time (gensym)
is evaluated, it returns a new name that it has not
returned before.
Now we can replace the variable incr
with a name generated by gensym
:
(defmacro add1-good1 [n]
(let [vbl (gensym)] ;; note that this outer let is not quoted
(list
'let [vbl 1] ;; vbl is not quoted
(list '+ n vbl)
)
)
)
=> (let [incr 1] (add1-good1 incr))
2
=> (let [a 1] (add1-good1 a))
2
This trick is so common in macros, that Clojure provides some syntactic help to simplify it a bit:
(defmacro add1-good2 [n]
(list
'let ['incr# 1]
(list '+ n 'incr#)
)
)
Putting a #
at the end of a symbol is called an auto-gensysm. The name
incr#
is guaranteed to be unique.
Double Evaluation¶
Consider this function and macro:
(defn dbg [x]
(println x)
x
)
=> (dbg 4)
4
4
(defmacro square [n]
(list '* n n)
)
=> (square 4)
16
The dbg
function first prints whatever you pass to it, and then returns it
as a value. As its name suggests, it’s meant for debugging.
Using dbg
, we can see that the square
macro suffers from double
evaluation, i.e. whatever input you pass it gets evaluated twice (because
n
appears two times in the macro):
=> (square (dbg 4))
4
4
16
=> (macroexpand '(square (dbg 4)))
(* (dbg 4) (dbg 4))
The correct answer (16) is returned, but you might expect that 4 should be
printed only once. It certainly appears that (dbg 4)
is only evaluated
once, but if you read the square
macro you can see that . anything you
pass to it gets evaluated twice. Not only is this inefficient, it can cause
serious problems for any expressions with side-effects, such as printing to
the screen, writing to a file, modifying a global variable, etc.
To avoid double evaluation, you can use let
to make sure the evaluation
only happens once:
(defmacro square-fixed [n]
(list
'let ['a# n] ;; n is only evaluated once here
(list '* 'a# 'a#)
)
)
=> (square-fixed (dbg 4))
4
16
user=> (macroexpand '(square-fixed (dbg 4)))
(let* [a (dbg 4)] (* 4 4))
When Should You Use Macros?¶
Some programmers believe that macros are a disaster. They say that they’re hard to write, hard to understand, and hard to debug. You shouldn’t use programming language features that are both difficult and error-prone, they say. The lack of macros in a language is a feature!
But some programmers think that the benefits of macros outweigh these costs. Yes, they say, macros are harder to write, understand, and debug as compared to regular functions, but they are also more powerful than regular functions. When use with care, they can make your programs much easier to read and understand, and more fun to use. Plus they can do things you can’t possibly do with functions, like create new control structures.
A general rule of thumb is to use macros sparingly. If a function will do, then use a function. Use macros only when functions won’t do the job, or when they provide a clear and obvious benefit, such as significantly improved syntax.
Finally, we should mention that if you use macros a lot, using so many
list
and '
tokens soon becomes tiresome, and this leads to errors. So
Clojure provides some extra syntax designed specifically for helping with
macros. For example, a better way to write the if-then-else
macro is like
this:
=> (defmacro if-then-else [condition a b] `(if ~condition ~a ~b))
#'user/if-then-else
We’ve used a backquote instead of a regular (forward) quote in this
function. A backquote in Clojure is like a regular quote, but it lets us use
~
inside the list to turn off quoting. So if
is a symbol that stand
for itself, while ~condition
, ~a
and ~b
are replaced by what those
variables refer to.
Although we haven’t used it here, the backquote also lets use the ~@
operator for splicing lists, e.g.:
=> (def vars '(x y z))
=> `(a b ~@vars c)
(user/a user/b x y z user/c)
You can also see here that the backquote has prepended user/
in front of
a
, b
, and c
. user/
refers to the namespace that these
variables are in, and this helps to prevent unintended variable capture.
Admittedly, the benefit of using a backquote in if-then-else
is small. But
if you write a lot of macros, especially larger and more complex ones,
backquotes can improve readability quite a bit.
Making Your cons, first, and rest¶
One of the many interesting things about Lisp is that it’s possible to write
your own versions of the standard cons
, first
, and rest
functions
as follows:
(defn mycons [x y] (fn [m] (m x y)))
(defn myfirst [z] (z (fn [p q] p)))
(defn myrest [z] (z (fn [p q] q)))
The is known as the Church encoding (after Alonzo Church). It essentially defines a list data type in Clojure using only functions. For instance:
=> (def lst (mycons 1 (mycons 2 (mycons 3 ()))))
#'user/lst
=> lst
#object[user$mycons$fn__1212 0x5c2ec4 "user$mycons$fn__1212@5c2ec4"]
=> (myfirst lst)
1
=> (myfirst (myrest lst))
2
The trick is that the list (3)
can be represented by the expression
(mycons 3 ())
. This is equivalent to the function (fn [m] (m 3 ()))
,
where 3
and ()
are stored within the function like a pair of values.
When myfirst
is called with this function as the input, m
gets
replaced by a function that takes two inputs, p
and q
, and returns
p
. myrest
is similar, except its function returns q
instead.
This is not an efficient way to represent lists in practice, but it is useful
in theory. It shows that functions like cons
, first
, and rest
need
not primitives, but instead can be defined in terms of functions.
It’s also possible to represent integers and arithmetic in this purely functional way. Again, this is not efficient, but it is quite important in theory.
Clojure Lists¶
Lists in Clojure are implemented as traditional singly-linked lists. Each item in a list is represented by a cons cell, i.e. a record that contains two pointers, one the first element of the list, and one to the rest of the list.
Because of this implementation, cons
, first
, and rest
are all
\(O(1)\) functions. Functions like append
or last
are
\(O(n)\).
Interestingly, and usefully, Clojure lists are normally immutable, meaning that once you create a list you can’t change it. A consequence of this is that lists can easily share parts, e.g.:
(def A '(1 2 3 4))
(def B (rest A))
Here, B
is essentially a pointer to the
<https://en.wikipedia.org/wiki/Cons>`_ for 2. No copy of the elements in A
needs to be made — B
and A
share a sublist.
Another useful consequence is list immutability is that concurrent programming with them tends to be easier because processes never need to worry about lists changing. Concurrent processes can only read a list, never write it.
Clojure Vectors¶
We’ve been using lists as the main data type in Clojure. But Clojure also supports vectors and maps, and in this section we will take a brief look the unusual way that Clojure implements its vectors.
All programming languages have some kind of array, or vector, for storing a sequence of items. For example, in Java, you could create an array like this:
int[] nums = {4, 5, 6, 7};
System.out.println(nums[3]); // prints 7
Internally, Java implements nums
as a contiguous sequence of int
s. Thus, accessing an element by index takes \(O(1)\) time, no matter how
long the array is.
In general, this is how most programming languages implement arrays or vectors. One other detail is whether or not the size of the array can change after it’s created. In languages like Java, C/C++, and Go, once you create an array its length never changes. We call this a fixed-size, or static array.
But in other languages, such as sequences in Python, or vectors in C++, are dynamic, i.e. their size can grow or shrink as needed. For instance, this is common in Python:
>>> nums = [] # Python code
>>> nums.append(5)
>>> nums.append(6)
>>> nums.append(7)
>>> nums
[5, 6, 7]
Here, nums
grows in size as needed. Internally, Python implements
nums
as a contiguous sequence of objects, and append
has some rules
for deciding how much memory the underlying array should take up. Dynamic
arrays like this are responsible for managing their own memory, which can be
very convenient for many sorts of applications. But the downside is that
nums
might use more memory than it needs to store its current numbers in
order to make future calls to append
faster. This is a common trade-off:
use more memory to make a program faster.
With that it mind, let’s look at Clojure vectors, i.e. it’s version of arrays. Clojure vector literals use square brackets, e.g.:
user=> (def nums [4 5 6 7])
#'user/nums
user=> nums
[4 5 6 7]
user=> (nums 3) ;; access the 3rd element of nums
Notice that no extra syntax is used to access elements. To get the number at
location i
, you can just write (nums i)
.
A basic fact about Clojure vectors are that they are persistent, which means that any time you modify or update them, you are creating a new vector. Think about this for a moment, and you’ll realize that if Clojure vectors were implemented as a contiguous sequence as in Java or C/C++, then every time you change a vector you’d need to make a copy of it. That would be prohibitively expensive in both time and memory.
So Clojure implements its vectors in an usual way: they’re a kind of tree. While we don’t have time to go into the details here (see this blog post if you are interested), the upshot is that Clojure vectors have nearly the same good performance characteristics as C/C++-style arrays, e.g. almost \(O(1)\) time for appending an element to a vector, and almost \(O(1)\) time for modifying an element (assuming you know its index position).
We say almost \(O(1)\) because, with a tree, it’s actually \(O(log n)\). But, Clojure knows that vector performance is important, and so uses a number of optimizations and tricks to get the performance to be pretty nearly \(O(1)\) in most practical uses.
Also, Clojure vector use more memory than a C/C++-style array because of the pointers it uses to keep track of all the nodes.
The reason Clojure uses this implementation of vectors is to have vectors that are persistent, i.e. that don’t change. This is very useful for concurrent/parallel programming, where one thread of control might modify a vector being used by a different thread of control. In C/C++, the array must be locked, or somehow specially managed, to avoid race conditions. But in Clojure, an array never changes, and so you don’t need to worry about some other thread over-writing a shared vector.
Clojure isn’t the only language with an usual implementation of arrays. Lua, for instance, implements arrays (and lists!) as a table of key as (key, value) pairs. Tables like this could be implemented as hash tables, or trees.
Extra: Running a Clojure App with Leiningen¶
Now lets look at a very neat tool that Clojure provides in its large set of library functions: unification. Unification is an algorithm that takes two lists as input, either of which may contain variables, and, if possible, returns a list of assignments to the variables that makes the two list the same. This turns out to be a surprisingly powerful and useful tool.
Clojure provides a unification library called core.unify. The easiest way to get access to it is to first install the Leiningen package management tool, and then to create a new Clojure “app” at the command-line like this:
$ lein new app prop-eval
This creates a new folder called prop-eval
that contains various
configuration files, including the file project.clj
. Inside
project.clj
is where we put the requirement to use core.unify
:
(defproject app "1.0.0-SNAPSHOT"
:description "FIXME: write description"
:dependencies [[org.clojure/clojure "1.4.0"]
[org.clojure/core.unify "0.5.5"] ;; core.unify dependency
])
The dependency information [org.clojure/core.unify "0.5.5"]
was cut-and-
pasted directly from the core.unify page.
Note also that the version of Clojure is listed as a dependency here. This makes it easy to use whatever version of Clojure you want.
To use core.unfiy
, we now run Clojure through Leiningen. Run lein repl
in the same folder as
project.clj
:
$ lein repl
... text indicating libraries are being downloaded ...
REPL started; server listening on localhost port 9347
user=>
This works just like ordinary Clojure, but now the dependencies in
project.clj
are satisfied, meaning we can import core.unify
like
this:
user=> (use 'clojure.core.unify)
nil
Unification with core.unify¶
Assuming you have set up your Clojure app as in the previous section, we can
now write code either in the REPL launched by lein repl
, or inside the
file prop-eval/src/app/core.clj
.
Here’s how unify
works:
user=> (use 'clojure.core.unify)
nil
user=> (unify '?x 5)
{?x 5}
The input to unify
is always two expressions. In this case, the first
expression is '?x
, and the second expression is 5
. The ?x
in is a
variable — unify
treats all symbols starting with ?
as variables.
What unify
does is try to find an assignment of values to the variables in
the two expressions that make the expressions identical. In this case the
answer is easy: setting ?x
to 5 unifies the expressions.
If unify
is able to unify the expressions, then it returns a Clojure map
showing the assignment of values to variables that make the expressions the
same.
Here’s another example:
user=> (unify '(?x and ?y) '(f and t))
{?y t, ?x f}
This says that if you replace ?y
with t
and ?x
with f
, the two
expressions passed to unify
will be identical.
In this example, two variables unify with each other:
user=> (unify '(?x 3 ?y) '(2 ?z ?w))
{?y ?w, ?z 3, ?x 2}
The result says that ?y
and ?w
should have the same value.
Sometimes expressions can’t be unified. For example:
user=> (unify '(?x) '(a b))
nil
No matter what you assign to ?x
, it cannot be made to be equal to '(a
b)
. So we say the unification has failed.
Unification turns out to be very useful for pattern matching. For example,
consider the is-and?
function from above:
;; (expr and expr)
(defn is-and? [e]
(and (list? e) (= 3 (count e)) (= 'and (second e))))
We could re-write it using unification like this:
(defn is-and? [e]
(not (nil? (unify e '(?x and ?y))))
)
This can be improved a little bit by writing a helper function called
match?
:
(defn match? [e1 e2]
(not (nil? (unify e1 e2)))
)
Now is-and?
becomes this:
(defn is-and? [e] (match? e '(?x and ?y)))
This is clearer and simpler than the original is-and?
. It’s now easy to
write the other is-
functions in the same style:
(defn is-and? [e] (match? e '(?x and ?y)))
(defn is-or? [e] (match? e '(?x or ?y)))
(defn is-nand? [e] (match? e '(?x nand ?y)))
(defn is-not? [e] (match? e '(not ?x)))
As this shows, unification can be quite useful in practice when you want to do pattern matching.