More About Clojure ================== Clojure_ is a powerful language packed with many interesting ideas. Even if you never end up using Clojure_ in practice, it's instructive to examine some of its many interesting feature. apply and eval -------------- The ``apply`` and ``eval`` functions are used to evaluate lists. Here is the usual way of evaluating expressions:: => (* 1 2 3) 6 => (+ 1 2 3) 6 => (- 1 2 3) -4 An equivalent way is to use ``apply`` like this:: => (apply * '(1 2 3)) 6 => (apply + '(1 2 3)) 6 => (apply - '(1 2 3)) -4 As you can see, ``(apply f seq)`` takes a function ``f`` and evaluates it using the items in ``seq`` as its input. It's as if ``f`` is "cons-ed" onto ``seq``, and then that expression is evaluated. The ``eval`` function takes an entire list an evaluates it:: => (eval '(* 1 2 3)) 6 => (eval '(+ 1 2 3)) 6 => (eval '(- 1 2 3)) -4 ``eval`` is a very powerful function: it is essentially Clojure_ implemented in Clojure_! For example, you can easily implement your own version of ``apply`` using ``eval``:: (defn my-apply [f seq] (eval (cons f seq)) ) Rest Args --------- Sometimes it's convenient to pass an unspecified number of inputs to a function. For example, it's nice to have a ``sum`` function that works like this:: => (sum 1 2 3) 6 => (sum 6 7 8 9) 30 => (sum 6) 6 => (sum) 0 This cuts down on the number parentheses and ``'`` marks, which makes the code easier to read. In Clojure_, you can implement ``sum`` using **rest arguments**:: (defn sum [& nums] (apply + nums) ) Here, ``nums`` is the sequence of all the arguments passed to ``sum`` --- the ``&`` tells Clojure_ that 0 or more inputs are expected. Argument Destructuring ---------------------- Consider the following function that adds two 2-dimensional points:: (defn add-points1 [p1 p2] (let [x1 (first p1) y1 (second p1) x2 (first p2) y2 (second p2) ] (list (+ x1 x2) (+ y1 y2)) ) ) => (add-points1 '(1 2) '(3 4)) (4 6) Most of the function is devoted to extracting that data out of ``p1`` and ``p2``. The code isn't difficult, but it is tedious to write, and thus error- prone (are you sure you got all the 1s and 2s right?). So Clojure_ provides a nice feature called **destructuring** that can help you skip a lot of this sort of data extraction:: (defn add-points2 [[x1 y1] [x2 y2]] ;; with destructuring (list (+ x1 x2) (+ y1 y2)) ) This is shorter and clearer. Clojure_ automatically assigns the correspond values to ``x1``, ``y1``, ``x2``, and ``y2``. However, it's still up to you, the programmer, to pass in correctly formatted data:: => (add-points2 '(1 2 3) '(3 4 5)) (4 6) => (add-points2 '(1) '(3 4 5)) java.lang.NullPointerException (NO_SOURCE_FILE:0) Multiple Function Signatures ---------------------------- Sometimes you want a function to take different numbers, or kinds, of inputs. For example:: => (make-point) (0 0) => (make-point 8) (8 8) => (make-point 4 5) (4 5) You can implement ``make-point`` like this:: (defn make-point ([] '(0 0)) ([x] (list x x)) ([x y] (list x y)) ) Each line of ``make-point`` is a different function signature with a different function body. Here's another example from http://clojure.org/reference/special_forms:: (defn mymax ([x] x) ([x y] (if (> x y) x y)) ([x y & more] (reduce mymax (mymax x y) more))) Functions that Return Functions ------------------------------- In Lisp, functions are values that be passed to functions, returned from functions, and assigned to variables. For example:: (defn make-adder [n] (fn [x] (+ n x)) ) ``make-adder`` takes a number, ``n``, as input and returns a function that adds ``n`` to the value given to it. For example:: => (def add4 (make-adder 4)) ;; note the use of def instead of defn => (add4 2) 6 => (add4 -3) 1 You can even do things like this:: => ((make-adder 4) 5) 9 When ``((make-adder 4) 5)`` is evaluated, first ``(make-adder 4)`` is evaluated, which returns a function, and then this function is applied to 5. Here's another handy function that returns a function:: (defn negate [pred] (fn [x] (not (pred x))) ) This takes a predicate, i.e. a function that takes one input and returns a boolean value, as input, and returns new predicate that is its negation. For example:: => (def my-odd (negate even?)) #'user/my-odd user=> (my-odd 5) true user=> (my-odd 6) false Or:: => (keep-if (negate even?) '(1 2 3 4 5 6 7)) (1 3 5 7) This removes all the even numbers from a list. If you instead used ``(negate odd?)``, it would remove all the odd numbers. Memoization ----------- Consider the following classic recursive function:: (defn fib [n] (cond (= n 0) 1 (= n 1) 1 :else (+ (fib (- n 2)) (fib (- n 1))) ) ) ``(fib n)`` recursively calculates the nth Fibonacci number using the traditional recurrence, i.e. :math:`F_0 = 1`, :math:`F_1 = 1`, and :math:`F_n = F_{n-1} + F_{n-2}`. On the plus side, this function is very close to the mathematical definition, and so it is quick to write, short, and easy to check for correctness. On the negative side, it is extremely inefficient because almost every call to ``fib`` makes two more calls to ``fib``, and this results in a huge amount of redundant calculation. Try calculating, say, ``(fib 35)``, to see how slow it is. One way to improve this function is to rewrite it using a loop. Clojure_ provides a number of looping constructs that you could use to do this. But there is another interesting and general-purpose approach to speeding-up this function that is much easier to implement (once you know about it!). First, lets rewrite the definition of ``fib`` to use plain ``def`` instead of ``defn``:: (def fib (fn [n] (cond (= n 0) 1 (= n 1) 1 :else (+ (fib (- n 2)) (fib (- n 1))) ) ) ) Now, we can speed up this version of the function by using the built-in ``memoize`` function:: (def fib (memoize (fn [n] (cond (= n 0) 1 (= n 1) 1 :else (+ (fib (- n 2)) (fib (- n 1))) ) ) ) ) ``(memoize f)`` returns a **memoized** version of ``f``. In this case, the result is a dramatic speed-up of ``fib``: I can calculate ``(fib 91)`` apparently instantaneously (values greater than 91 cause overflow). Memoization does two basic things. It adds: - A hash table to ``fib`` where it stores any values that it calculates. - An if-statement to the beginning of the function so that when ``(fib n)`` is called, it first checks to see if that value of ``n`` has already been calculated. If so, it returns that value and does not call the body of the function. If ``n`` is not in the table, then it is calculated using the body of the function, and this result is stored in the table and then returned as the value of the function. Essentially, memoization caches values. This can speed up almost any function that makes lots of function calls (such as ``fib``). It's the same sort of trick that high-traffic websites use to increase performance, but on the level of a function instead of an entire website. The downside of memoization is that the storage table takes up extra space. For the ``fib`` function, the memoization table is never going to be very big because it can't calculate values greater than ``(fib 91)``. This is a traditional trade-off: by using more memory, we get more speed. Aside: Memoization in Python ---------------------------- Let's look at how memoization can be implemented in Python. Python's syntax makes the details a little clearer, plus it provides a convenient usage syntax called a decorator. For concreteness, lets memoize this function by hand:: def fib(n): if n == 0: return 1 elif n == 1: return 1 else: return fib(n-2) + fib(n-1) Memoization works by adding a table that stores (``n``, ``f(n)``) pairs. The easiest way to implement such a table in Python is to use a dictionary:: cache = {} # an initially empty dictionary Next, we modify the code in ``fib`` to check to see if the desired value has already been calculated:: cache = {} def fib2(n): if n == 0: return 1 elif n == 1: return 1 else: if not n in cache: cache[n] = fib2(n-2) + fib2(n-1) return cache[n] This is *much* faster than ``fib``! What's interesting about this is that adding the memoizing does not require modifying the body of ``fib``. This the key to writing a general-purpose memoizer that works with any function. In Python, the best way to write a memoizer is to use a class like this:: class memoize(object): def __init__(self, fn): self.fn = fn self.cache = {} def __call__(self, *args): if not args in self.cache: self.cache[args] = self.fn(*args) return self.cache[args] Don't worry about the exact syntactic details of Python. Just notice that the code that was used to implement memoization was generalized and put into a class. Now we can easily memoize a function by using decorator syntax, e.g.:: @memoize # decorator def fib(n): if n == 0: return 1 elif n == 1: return 1 else: return fib(n-2) + fib(n-1) The decorator acts as if we had written ``fib = memoize(fib)``. Closures -------- Why is Clojure_ called Clojure_? It's a play on the term **closure**, and the "j" is for Java_. So what is a closure? It's a combination of two things: a function, and an environment of (variable, value) pairs. The function is allowed to use variables from this environment. For example, consider this code:: (defn f [x] (+ x 1)) ``f`` is the name of a function, not a closure. But things change when we define it like this:: (defn make-adder [n] (fn [x] (+ n x)) ;; n's value comes from outside the lambda function ) (def g (make-adder 1)) ;; note the use of def ``g`` names a closure that includes both a function and a binding for the variable ``n``. If you have just the function ``(fn [x] (+ n x))``, then you can't actually evaluate it because ``n`` is not specified. But, of course, in this example, ``n`` is bound to the value 1. The function plus this binding of ``n`` is a closure. More concretely, compare this:: => (def g1 (let [n 1] (fn [x] (+ n x)))) => (g1 5) 6 => (g1 9) 10 To this:: => (def g2 (fn [x] (+ n x))) user=> CompilerException java.lang.RuntimeException: Unable to resolve symbol: n in this context, compiling:(/tmp/form-init597735295358479020.clj:1:17) Clojure_ doesn't let us define ``g2`` because the variable ``n`` is free, i.e. it is not bound to any value. And so how could Clojure_ possibly evaluate ``(g2 5)`` or ``(g2 9)``? It can't! ``g1`` is a closure. You can see that it's not *just* a function, but a function along with some variable bindings that it needs. Any programming language that let you to define a function inside another function must deal with the issue of what do about variables the inner function uses that are not defined inside it. Closures are a natural and useful solution to this problem. Languages that have lambda functions --- such as Go_, Java_, C++, Python, JavaScript, etc. --- also support closures. In Go_, for example, you can do this:: package main import "fmt" type intFunc func(int) int func makeAdder(n int) intFunc { result := func(x int) int { return x + n } return result } func main() { add2 := makeAdder(2) // add2 is a closure fmt.Println(add2(5)) } ``add2`` is a closure. The ``n`` inside the result function is bound to the value of ``n`` passed to ``makeAdder``, and it stays bound for the life of the ``result`` function. So even after ``makeAdder`` finishes executing, the ``n`` in the ``result`` is still there and can be used as long as ``add2`` exists. As an aside, notice how the *typing* information that Go_ requires makes the code longer, and more cluttered, than the same code in Clojure_. Clojure_ is dynamically typed, and so doesn't check for type information until run time. Composing Functions ------------------- Another interesting feature of Lisp_ is the ability to easily compose functions. Recall how function composition works in mathematics. Given :math:`f(x) = x^2` and :math:`g(x) = 2x + 1`, the composition of :math:`f` and :math:`g`, denoted :math:`f \circ g`, can be calculated as :math:`f(g(x)) = g(x)^2 = (2x + 1)^2 = 4x^2 + 4x + 1`. In Clojure_, we can write these functions like this:: (defn f [x] (* x x)) (defn g [x] (+ (* 2 x) 1)) For example:: user=> (f 2) 4 user=> (g 2) 5 user=> (f (g 2)) 25 One way to compose functions is to write a function called ``compose`` like this:: (defn compose [f g] (fn [x] (f (g x)))) ``compose`` takes two functions, ``f`` and ``g`` as input, and returns the function ``(fn [x] (f (g x)))`` as output. Now we can write the composition of ``f`` and ``g`` like this:: (def fog (compose f g)) (The built-in Clojure_ function ``comp`` does the same thing.) And it works:: user=> (fog 2) 25 The ``twice`` function takes a function, ``f``, as input and returns ``f`` composed with itself, i.e. :math:`f \circ f`:: (defn twice [f] (compose f f)) For instance:: (def garnish (twice (fn [x] (cons 'cheese x)))) => (garnish '(1 2 3)) (cheese cheese 1 2 3) We can generalize this as follows:: (defn compose-n [f n] (cond (= n 1) f :else (compose f (compose-n f (- n 1))) ) ) In a sense, this returns a new function that is ``f`` to the power of ``n``, where function composition is being used instead of multiplication. Partial Function Application ---------------------------- Composition is a way of creating a new function from a pair of other functions. Another interesting way to create a function from a function is to use **partial evaluation**. For example:: (defn add [x y] (+ x y)) (def add5 (partial add 5)) ;; note the use of def (not defn) => (add5 6) 11 The ``add`` function takes two inputs, ``x`` and ``y``, while ``add5`` takes only one input. The built-in function ``partial`` takes a function ``f`` with n inputs as the its first argument, followed by fewer than n arguments. It then returns a new function that is the same as the passed-in function, except the parameters passed to ``partial`` are bound to the corresponding formal parameters of ``f``. Here's another example:: (def hatter (partial cons 'hat)) ;; note the use of def (not defn) => (hatter '(a b c)) (hat a b c) And one more:: (def cheese-it (partial compose-n (fn [x] (cons 'cheese x)))) => ((cheese-it 5) '(1 2 3)) (cheese cheese cheese cheese cheese 1 2 3) Example: Making Selector Functions ---------------------------------- As an example of functional programming, lets see how we might go about creating a function that makes selector functions. Here are some examples of selector functions (``first`` and ``second`` are already defined in Clojure_):: (defn third [seq] (first (rest (rest seq))) ) (defn fourth [seq] (first (rest (rest (rest seq)))) ) For example:: user=> (third '(1 2 3 4 5)) 3 user=> (fourth '(1 2 3 4 5)) 4 It's not hard to see the pattern for how these functions are defined: to get the nth element of a sequence, call ``rest`` on it n-1 times (thus removing the first n-1 elements), and then call ``first``. With some thought, you can come up with a function like this:: (defn make-selector1 [n] (compose first (compose-n rest (- n 1))) ) ``make-selector1`` returns a function that can be used as a selector. So we could use it like this:: (defn third [seq] (make-selector1 3) ) (defn fourth [seq] (make-selector1 4) ) (defn fifteenth [seq] (make-selector1 15) ) Another interesting way to create a selector-making function is to used the built-in compose function in Clojure_ called ``comp``. It's like the ``compose`` function we defined, but it lets you compose more than two functions at once, e.g.:: user=> (defn add1 [x] (+ 1 x)) user=> (def t (comp add1 add1 add1)) user=> (t 5) 8 This saves us from typing some extra parentheses (always a nice thing in Lisp_!), e.g.:: (defn third [seq] ((comp first rest rest) seq) ) (defn fourth [seq] ((comp first rest rest rest) seq) ) The pattern of how these functions work is now even clearer. Using the built- in function ``replicate``, we could do this:: user=> (replicate 5 rest) (# # # # #) ``(replicate n x)`` returns a list containing n copies of x. The example here shows a list containing 5 copies of the ``rest`` function. With this, and the ``apply`` function (explained below), we can write this function:: (defn make-selector2 [n] (apply comp (cons first (replicate (- n 1) rest)) ) ) (def third (make-selector2 3)) (def fourth (make-selector2 4)) (def fifteenth (make-selector2 15)) Look at the body of ``make-selector2``. The expression ``(cons first (replicate (- n 1) rest))`` creates a list whose first element is the function ``first``, and whose remaining n-1 elements are the function ``rest``. What we want to do is compose all those functions together using ``comp``. You might first guess that you could do it like this:: (cons comp (cons first (replicate (- n 1) rest)) ) But this returns a list whose first element is the function ``comp``, and whose remaining elements are as above (``first``, followed by many copies of ``rest``). It is just a list, and does not get evaluated. So instead of ``cons``, Clojure_ provides the built-in ``apply`` function that takes a function and a list of parameters as input, and returns the result of applying the function to those parameters. For example:: => (defn plus [x y] (+ x y)) => (apply plus '(2 3)) 5 That's just what we need here because the arguments to ``comp`` are constructed as a list:: (apply comp (cons first (replicate (- n 1) rest)) ) Macros ------ Macros are one of the most interesting and powerful features in Lisp_. Intuitively, you can think of them as programs that make other *programs*. This is in contrast to functions, which are programs that make *values*. Macros let you do things like create our own def-like environments, and our own control structures --- things that are usually impossible to do languages like Java or C++. C and C++ also have macros, but they're less-powerful than Clojure_ macros. C/C++ macros are just string-based search-and-replace done before compilation, and they are not an intrinsic part of the language. You could even replace the default C/C++ macro processor with a different one, if you like. Clojure_ macros, however, are a fundamental feature of the language that let you decide *when* an expression is evaluated. They work on Clojure_ expressions, not just strings. To see why you might want to make a macro, consider this example:: => (def x 3) #'user/x => x 3 ``def`` binds the value 3 to ``x``. But here's an interesting question: is ``def`` a function? No, it turns out that it's not. If ``def`` were a function, then you'd get an error saying ``x`` is undefined, just like this:: ;; assume x is not defined => (+ 2 x) java.lang.Exception: Unable to resolve symbol: x in this context (NO_SOURCE_FILE:1) That's because the rule for calling ``(f x y)`` in Clojure_ is to first evaluate ``x`` and ``y``, and then pass those results to ``f``. So ``(def x 3)`` wouldn't work if it were a function because Clojure would try to evaluate ``x``. But ``x`` doesn't exist yet. Clojure_ calls expressions like ``(def x 3)`` **special forms** to distinguish them from regular functions. Other special forms are ``defn``, ``and``, ``let``, ``cond``, etc. In most languages, special forms are hard-coded into the language, and you can't add your own. But in Lisp, macros let you do just that (among other things!). Here's another example. Suppose you want to create your if-then-else control structure. You can't do that with a function:: ;; BAD!: can't create a control structure with a function ;; a and b are both evaluated before they are passed to if-then-else, ;; but we only ever want to evaluate one of then (defn if-then-else [condition a b] (if condition a b) ) user=> (if-then-else true (println "yes") (println "no")) yes no nil user=> (if-then-else (= 1 2) (println "yes") (println "no")) yes no nil The problem is that the if-part and the else-part are always *both* evaluated when ``if-then-else`` is called. That's wrong. For an if-else statement, we only ever want to evaluate one of the ``a`` or ``b``, and *never* both. The problem is that the function parameters are evaluated *before* being passed to the function. But with expressions like ``def`` and ``if-then-else``, we want the parameters to be evaluated *after* they are passed in. This is what macros do: they let us control when parameters are evaluated. The essential difference between a macro and a function is that a macro is passed its parameters *unevaluated*, while functions are passed their parameters evaluated. So we can use macros to write our own special forms, e.g. def-like environments, or control structures like ``if-then-else``. This macro correctly implements ``if-then-else``:: (defmacro if-then-else [condition a b] (list 'if condition a b) ) The trick with a macro is that its body creates an expression that, when evaluated, does what we want it to do. Macros run in a two-step process: they first create an expression, and then they evaluate the expression. The creation step is called **expansion**, and it is this step that functions don't do. Use the ``macroexpand`` function to see what a macro expands to:: => (macroexpand '(if-then-else false (println "Hi!") (println "Bye!"))) (if false (println "Hi!") (println "Bye!")) The result here shows exactly what Clojure_ will evaluate when you call ``(if- then-else false (println "Hi!") (println "Bye!"))``. In ``if-then-else``, ``a`` and ``b`` are *not* evaluated when the macro is expanded: they are just put onto a list as-is. When the macro is evaluated, so too are ``a`` and ``b``: they are evaluated *after* they are passed to ``if-then-else``. Clojure_ does not have a ``nand`` operator, so lets write a macro for it. The expression ``(nand a b)`` should be true just when ``a`` and ``b`` are not both true. In other words, ``(nand a b)`` always has the same value as ``(not (and a b))``:: (defmacro nand [a b] (list 'not (list 'and a b)) ) r=> (nand true false) true => (nand (< 1 2) (< 3 4)) false It seems to work. Use ``macroexpand`` to check that the expression it evaluates is correct:: => (macroexpand '(nand true false)) (not (and true false)) This tells us that when we evaluate ``(nand true false)``, it will first expand into ``(not (and true false))``, and then this expanded expression will be evaluated. Similarly:: => (macroexpand '(nand (> 1 2) (> 3 4))) (not (and (> 1 2) (> 3 4))) ``macroexpand`` is essential for debugging macros. It shows you *exactly* what expression gets evaluated. Notice that we used ``list`` twice inside ``nand``. This version would **not** work:: (defmacro nand-bad1 [a b] (list 'not '(and a b)) ) => (nand-bad1 true false) java.lang.Exception: Unable to resolve symbol: a in this context (NO_SOURCE_FILE:106) => (macroexpand '(nand-bad1 true false)) (not (and a b)) ``macroexpand`` shows why this is wrong. When ``(nand-bad1 true false)`` is evaluated, it's expanded to ``(not (and a b))``, and then ``(not (and a b))`` is evaluated. The error is that ``a`` is not defined here. Indeed, we don't want ``a`` here, we want the expression ``a`` refers to, which is ``true`` in this case. A def-like Macro ~~~~~~~~~~~~~~~~ Here's an example of a def-like macro. Suppose we're writing a program that involves point objects defined like this:: (defn make-point-obj [x y] (list 'point (list 'x x) (list 'y y)) ) => (make-point-obj 4 9) (point (x 4) (y 9)) => (make-point-obj -6 18) (point (x -6) (y 18)) You can use ``def`` to assign a point to a global variable, e.g.:: => (def origin (make-point-obj 0 0)) #'user/origin => origin (point (x 0) (y 0)) But you could also write a special macro for defining points like this:: (defmacro defpoint [vble a b] (list 'def vble (list 'make-point-obj a b)) ) Now you can write code like this:: => (defpoint origin 100 50) #'user/origin => origin (point (x 100) (y 50)) Writing ``(defpoint origin 100 50)`` is a little more readable than ``(def origin (make-point-obj 0 0))``. You could also allow for the possibility of defining a point without yet knowing exactly what value you want to give it:: (defmacro defpoint ([v] (list 'def v (list 'make-point-obj 0 0)) ) ([v a b] (list 'def v (list 'make-point-obj a b)) ) ) => (defpoint q) #'user/q => q (point (x 0) (y 0)) Macro Pitfalls -------------- Macros are trickier to use than functions because of their extra expansion step. They can often suffer from subtle bugs if not implemented carefully. We'll discuss two basic problems you can run into: **variable capture**, and **double evaluation**. Keep in mind that these are *not* the only problems macros can suffer from --- there are other kinds of problems with macros that can arise! Variable Capture ~~~~~~~~~~~~~~~~ Macros can, if they need to, define variables. For instance, consider this macro:: (defmacro add1-bad [n] (list 'let ['incr 1] (list '+ n 'incr) ) ) => (add1-bad 5) 6 => (let [a 2] (add1-bad a)) 3 It's a contrived example, but it shows a subtle problem:: => (let [incr 2] (add1-bad incr)) 2 For some strange reason, the programmer decided use a variable named ``incr`` in this expression, and this breaks the macro: 2 is the wrong result! It should be 3. The problem here is that, unknown to the programmer, ``add1-bad`` uses ``incr`` as an internal variable. So the macro expands to this:: (let [incr 1] (+ incr incr)) Clearly, the problem is that the ``incr`` variable that gets passed into ``add1-bad`` has been *captured* by the local ``incr`` in the ``add1-bad`` ``let`` expression. The way to fix this problem is to make sure that the local variables in a macro *never* have the same name as any variable that might be passed into it. However, for any name we change ``incr`` to, it's possible that the programmer could pass that name in! The standard Lisp_ solution to the problem is to generate a unique variable name using the ``gensym`` function:: => (gensym) G__2728 => (gensym) G__2731 => (gensym) G__2734 Every time ``(gensym)`` is evaluated, it returns a new name that it has not returned before. Now we can replace the variable ``incr`` with a name generated by ``gensym``:: (defmacro add1-good1 [n] (let [vbl (gensym)] ;; note that this outer let is not quoted (list 'let [vbl 1] ;; vbl is not quoted (list '+ n vbl) ) ) ) => (let [incr 1] (add1-good1 incr)) 2 => (let [a 1] (add1-good1 a)) 2 This trick is so common in macros, that Clojure_ provides some syntactic help to simplify it a bit:: (defmacro add1-good2 [n] (list 'let ['incr# 1] (list '+ n 'incr#) ) ) Putting a ``#`` at the end of a symbol is called an **auto-gensysm**. The name ``incr#`` is guaranteed to be unique. Double Evaluation ~~~~~~~~~~~~~~~~~ Consider this function and macro:: (defn dbg [x] (println x) x ) => (dbg 4) 4 4 (defmacro square [n] (list '* n n) ) => (square 4) 16 The ``dbg`` function first prints whatever you pass to it, and then returns it as a value. As its name suggests, it's meant for debugging. Using ``dbg``, we can see that the ``square`` macro suffers from *double evaluation*, i.e. whatever input you pass it gets evaluated twice (because ``n`` appears two times in the macro):: => (square (dbg 4)) 4 4 16 => (macroexpand '(square (dbg 4))) (* (dbg 4) (dbg 4)) The correct answer (16) is returned, but you might expect that 4 should be printed only once. It certainly appears that ``(dbg 4)`` is only evaluated once, but if you read the ``square`` macro you can see that . anything you pass to it gets evaluated *twice*. Not only is this inefficient, it can cause serious problems for any expressions with side-effects, such as printing to the screen, writing to a file, modifying a global variable, etc. To avoid double evaluation, you can use ``let`` to make sure the evaluation only happens once:: (defmacro square-fixed [n] (list 'let ['a# n] ;; n is only evaluated once here (list '* 'a# 'a#) ) ) => (square-fixed (dbg 4)) 4 16 user=> (macroexpand '(square-fixed (dbg 4))) (let* [a (dbg 4)] (* 4 4)) When Should You Use Macros? --------------------------- Some programmers believe that macros are a disaster. They say that they're hard to write, hard to understand, and hard to debug. You shouldn't use programming language features that are both difficult and error-prone, they say. The *lack* of macros in a language is a *feature*! But some programmers think that the benefits of macros outweigh these costs. Yes, they say, macros are harder to write, understand, and debug as compared to regular functions, but they are also more powerful than regular functions. When use with care, they can make your programs much easier to read and understand, and more fun to use. Plus they can do things you can't possibly do with functions, like create new control structures. A general rule of thumb is to use macros sparingly. If a function will do, then use a function. Use macros only when functions won't do the job, or when they provide a clear and obvious benefit, such as significantly improved syntax. Finally, we should mention that if you use macros a lot, using so many ``list`` and ``'`` tokens soon becomes tiresome, and this leads to errors. So Clojure_ provides some extra syntax designed specifically for helping with macros. For example, a better way to write the ``if-then-else`` macro is like this:: => (defmacro if-then-else [condition a b] `(if ~condition ~a ~b)) #'user/if-then-else We've used a **backquote** instead of a regular (forward) quote in this function. A backquote in Clojure_ is like a regular quote, but it lets us use ``~`` inside the list to *turn off quoting*. So ``if`` is a symbol that stand for itself, while ``~condition``, ``~a`` and ``~b`` are replaced by what those variables refer to. Although we haven't used it here, the backquote also lets use the ``~@`` operator for splicing lists, e.g.:: => (def vars '(x y z)) => `(a b ~@vars c) (user/a user/b x y z user/c) You can also see here that the backquote has prepended ``user/`` in front of ``a``, ``b``, and ``c``. ``user/`` refers to the namespace that these variables are in, and this helps to prevent unintended variable capture. Admittedly, the benefit of using a backquote in ``if-then-else`` is small. But if you write a lot of macros, especially larger and more complex ones, backquotes can improve readability quite a bit. Making Your cons, first, and rest --------------------------------- One of the many interesting things about Lisp_ is that it's possible to write your own versions of the standard ``cons``, ``first``, and ``rest`` functions as follows:: (defn mycons [x y] (fn [m] (m x y))) (defn myfirst [z] (z (fn [p q] p))) (defn myrest [z] (z (fn [p q] q))) The is known as the Church encoding (after Alonzo Church). It essentially defines a list data type in Clojure_ using only functions. For instance:: => (def lst (mycons 1 (mycons 2 (mycons 3 ())))) #'user/lst => lst #object[user$mycons$fn__1212 0x5c2ec4 "user$mycons$fn__1212@5c2ec4"] => (myfirst lst) 1 => (myfirst (myrest lst)) 2 The trick is that the list ``(3)`` can be represented by the expression ``(mycons 3 ())``. This is equivalent to the function ``(fn [m] (m 3 ()))``, where ``3`` and ``()`` are stored within the function like a pair of values. When ``myfirst`` is called with this function as the input, ``m`` gets replaced by a function that takes two inputs, ``p`` and ``q``, and returns ``p``. ``myrest`` is similar, except its function returns ``q`` instead. This is not an efficient way to represent lists in practice, but it is useful in theory. It shows that functions like ``cons``, ``first``, and ``rest`` need not primitives, but instead can be defined in terms of functions. It's also possible to represent integers and arithmetic in this purely functional way. Again, this is not efficient, but it is `quite important in theory `_. Clojure Lists ------------- Lists in Clojure_ are implemented as traditional singly-linked lists. Each item in a list is represented by a `cons cell `_, i.e. a record that contains two pointers, one the first element of the list, and one to the rest of the list. Because of this implementation, ``cons``, ``first``, and ``rest`` are all :math:`O(1)` functions. Functions like ``append`` or ``last`` are :math:`O(n)`. Interestingly, and usefully, Clojure_ lists are normally **immutable**, meaning that once you create a list you can't change it. A consequence of this is that lists can easily share parts, e.g.:: (def A '(1 2 3 4)) (def B (rest A)) Here, ``B`` is essentially a pointer to the `_ for 2. No copy of the elements in ``A`` needs to be made --- ``B`` and ``A`` share a sublist. Another useful consequence is list immutability is that concurrent programming with them tends to be easier because processes never need to worry about lists changing. Concurrent processes can only read a list, never write it. Clojure Vectors --------------- We've been using lists as the main data type in Clojure_. But Clojure_ also supports vectors and maps, and in this section we will take a brief look the unusual way that Clojure_ implements its vectors. All programming languages have some kind of array, or vector, for storing a sequence of items. For example, in Java_, you could create an array like this:: int[] nums = {4, 5, 6, 7}; System.out.println(nums[3]); // prints 7 Internally, Java_ implements ``nums`` as a *contiguous* sequence of ``int``\ s. Thus, accessing an element by index takes :math:`O(1)` time, no matter how long the array is. In general, this is how most programming languages implement arrays or vectors. One other detail is whether or not the size of the array can change after it's created. In languages like Java_, C/C++, and Go_, once you create an array its length **never** changes. We call this a fixed-size, or static array. But in other languages, such as sequences in Python_, or vectors in C++, are dynamic, i.e. their size can grow or shrink as needed. For instance, this is common in Python_:: >>> nums = [] # Python code >>> nums.append(5) >>> nums.append(6) >>> nums.append(7) >>> nums [5, 6, 7] Here, ``nums`` grows in size as needed. Internally, Python_ implements ``nums`` as a contiguous sequence of objects, and ``append`` has some rules for deciding how much memory the underlying array should take up. Dynamic arrays like this are responsible for managing their own memory, which can be very convenient for many sorts of applications. But the downside is that ``nums`` might use more memory than it needs to store its current numbers in order to make *future* calls to ``append`` faster. This is a common trade-off: use more memory to make a program faster. With that it mind, let's look at Clojure_ vectors, i.e. it's version of arrays. Clojure_ vector literals use square brackets, e.g.:: user=> (def nums [4 5 6 7]) #'user/nums user=> nums [4 5 6 7] user=> (nums 3) ;; access the 3rd element of nums Notice that no extra syntax is used to access elements. To get the number at location ``i``, you can just write ``(nums i)``. A basic fact about Clojure_ vectors are that they are *persistent*, which means that any time you modify or update them, you are creating a new vector. Think about this for a moment, and you'll realize that if Clojure_ vectors were implemented as a contiguous sequence as in Java_ or C/C++, then every time you change a vector you'd need to make a copy of it. That would be prohibitively expensive in both time and memory. So Clojure_ implements its vectors in an usual way: they're a kind of tree. While we don't have time to go into the details here (see `this blog post `_ if you are interested), the upshot is that Clojure_ vectors have nearly the same good performance characteristics as C/C++-style arrays, e.g. *almost* :math:`O(1)` time for appending an element to a vector, and *almost* :math:`O(1)` time for modifying an element (assuming you know its index position). We say *almost* :math:`O(1)` because, with a tree, it's actually :math:`O(log n)`. But, Clojure_ knows that vector performance is important, and so uses a number of optimizations and tricks to get the performance to be pretty nearly :math:`O(1)` in most practical uses. Also, Clojure_ vector use more memory than a C/C++-style array because of the pointers it uses to keep track of all the nodes. The reason Clojure_ uses this implementation of vectors is to have vectors that are persistent, i.e. that don't change. This is very useful for concurrent/parallel programming, where one thread of control might modify a vector being used by a different thread of control. In C/C++, the array must be locked, or somehow specially managed, to avoid race conditions. But in Clojure_, an array never changes, and so you don't need to worry about some other thread over-writing a shared vector. Clojure_ isn't the only language with an usual implementation of arrays. `Lua `_, for instance, implements arrays (and lists!) as a table of key as (key, value) pairs. Tables like this could be implemented as hash tables, or trees. Extra: Running a Clojure App with Leiningen ------------------------------------------- Now lets look at a very neat tool that Clojure_ provides in its large set of library functions: **unification**. Unification is an algorithm that takes two lists as input, either of which may contain variables, and, if possible, returns a list of assignments to the variables that makes the two list the same. This turns out to be a surprisingly powerful and useful tool. Clojure_ provides a unification library called `core.unify `_. The easiest way to get access to it is to first install the `Leiningen package management tool `_, and then to create a new Clojure_ "app" at the command-line like this:: $ lein new app prop-eval This creates a new folder called ``prop-eval`` that contains various configuration files, including the file ``project.clj``. Inside ``project.clj`` is where we put the requirement to use ``core.unify``:: (defproject app "1.0.0-SNAPSHOT" :description "FIXME: write description" :dependencies [[org.clojure/clojure "1.4.0"] [org.clojure/core.unify "0.5.5"] ;; core.unify dependency ]) The dependency information ``[org.clojure/core.unify "0.5.5"]`` was cut-and- pasted directly from the `core.unify `_ page. Note also that the version of Clojure_ is listed as a dependency here. This makes it easy to use whatever version of Clojure_ you want. To use ``core.unfiy``, we now run Clojure_ through `Leiningen `_. Run ``lein repl`` in the same folder as ``project.clj``:: $ lein repl ... text indicating libraries are being downloaded ... REPL started; server listening on localhost port 9347 user=> This works just like ordinary Clojure_, but now the dependencies in ``project.clj`` are satisfied, meaning we can import ``core.unify`` like this:: user=> (use 'clojure.core.unify) nil Unification with core.unify --------------------------- Assuming you have set up your Clojure_ app as in the previous section, we can now write code either in the REPL launched by ``lein repl``, or inside the file ``prop-eval/src/app/core.clj``. Here's how ``unify`` works:: user=> (use 'clojure.core.unify) nil user=> (unify '?x 5) {?x 5} The input to ``unify`` is always two expressions. In this case, the first expression is ``'?x``, and the second expression is ``5``. The ``?x`` in is a variable --- ``unify`` treats all symbols starting with ``?`` as variables. What ``unify`` does is try to find an assignment of values to the variables in the two expressions that make the expressions identical. In this case the answer is easy: setting ``?x`` to 5 unifies the expressions. If ``unify`` is able to unify the expressions, then it returns a Clojure_ map showing the assignment of values to variables that make the expressions the same. Here's another example:: user=> (unify '(?x and ?y) '(f and t)) {?y t, ?x f} This says that if you replace ``?y`` with ``t`` and ``?x`` with ``f``, the two expressions passed to ``unify`` will be identical. In this example, two variables unify with each other:: user=> (unify '(?x 3 ?y) '(2 ?z ?w)) {?y ?w, ?z 3, ?x 2} The result says that ``?y`` and ``?w`` should have the same value. Sometimes expressions can't be unified. For example:: user=> (unify '(?x) '(a b)) nil No matter what you assign to ``?x``, it cannot be made to be equal to ``'(a b)``. So we say the unification has failed. Unification turns out to be very useful for pattern matching. For example, consider the ``is-and?`` function from above:: ;; (expr and expr) (defn is-and? [e] (and (list? e) (= 3 (count e)) (= 'and (second e)))) We could re-write it using unification like this:: (defn is-and? [e] (not (nil? (unify e '(?x and ?y)))) ) This can be improved a little bit by writing a helper function called ``match?``:: (defn match? [e1 e2] (not (nil? (unify e1 e2))) ) Now ``is-and?`` becomes this:: (defn is-and? [e] (match? e '(?x and ?y))) This is clearer and simpler than the original ``is-and?``. It's now easy to write the other ``is-`` functions in the same style:: (defn is-and? [e] (match? e '(?x and ?y))) (defn is-or? [e] (match? e '(?x or ?y))) (defn is-nand? [e] (match? e '(?x nand ?y))) (defn is-not? [e] (match? e '(not ?x))) As this shows, unification can be quite useful in practice when you want to do pattern matching.