Introduction to Macros in Racket

Introduction

In Racket, regular functions evaluate their arguments before calling a function. For example, if f is a function then (f (+ 2 3) (- 4 1)) is the same as (f 5 3). For most operations, this style of evaluation is efficient and works well.

But some operations don’t work this way. For example, define isn’t a function:

(define (f a b)
    (* a (+ b 2)))

define does not evaluate its arguments immediately. If it did, then it would try to evaluate (f a b), which would cause an error because f is probably not defined.

Similarly, flow of control operations, such as if, cond, and, and or, also don’t necessarily evaluate all their arguments. For example:

(if (= code 1)
    (launch-missiles)
    (do-not-launch-missiles)

We expect that an if form evaluates only one possibility, never both. If if were a function, then both (launch-missiles) and (do-not-launch-missiles) would get evaluated, no matter the value of code.

One last example: let-environments introduce new variables into a scope:

(let ([a 1]
      [b 2])
   (+ a b))

If let were a regular function, then ([a 1] [b 2]) would be evaluated, which would mostly likely cause an error since a and b are not defined.

What all three of these examples shows is that regular functions are not capable of implementing many of the forms we see in Racket. Many programmers are okay with that: most other languages don’t let you create new kinds of if-statements or ways to define functions.

But for those programmers who do want to implement such things, Racket macros are a powerful technique for implementing such operations. Racket’s macros are much more useful than the string-oriented macros you might have used in C/C++, and they are essentially a way to extend the language. If you want to implement your own language features — or even your own entire language — Macros are a good feature to learn.

Macros are a relatively complex feature of Racket, and here we will only discuss a few basic examples. If you are curious to learn more, you can find a number of on-line resources that discuss macros in more depth, such as Let Over Lambda, or Beautiful Racket.

A First Macro: assert

The following example is from Macros and Languages in Racket <http://rmculpepper.github.io/malr/basic.html>.

Suppose we want to implement a form (assert bool-expr) that does nothing if bool-expr evaluates to #t, but, if expr is #f, prints an error message with bool-expr in it.

You might try to do it this way as a function:

(define (assert-bad expr)
    (when (not expr)
        (error 'assert "assertion fail: ~s" expr))

> (assert-bad (= 2 2))  ;; does nothing if expr is #t

> (assert-bad (= 1 2))
. . assert: assertion fail: #f

The problem is the error message in the second call: it prints #f, when what we want to see is the original expression (= 1 2). But assert-bad can’t possibly print (= 1 2) because it gets passed what (= 1 2) evaluates to, i.e. #f.

You could add a second parameter to get the right error message:

(define (assert-also-bad expr quoted-expr)
    (when (not expr)
        (error 'assert "assertion fail: ~s" quoted-expr)))

> (assert-also-bad (= 1 2) '(= 1 2))
. . assert: assertion fail: (= 1 2)

But this forces the user to pass in two copies of the expression, one quoted and one not quoted. That’s both error-prone and messy, and not much of an improvement.

The solution to this problem is to use a macro:

(define-syntax-rule (assert expr)
  (when (not expr)
    (error 'assert "assertion failed: ~s" (quote expr))))

> (assert (= 1 2))
. . assert: assertion failed: (= 1 2)

A single non-quoted instance of (= 1 2) is passed in, and both its evaluated and non-evaluated form can be used. In the body of assert, you can see that expr is evaluated, but (quote expr) is not: it’s up to the programmer to decide which form of expr they want to use.

Using the same idea, you could write this macro for printing an expression and its evaluation:

(define-syntax-rule (print-val expr)
  (printf "~a ==> ~a" (quote expr) expr))

> (print-val (+ 1 2))
(+ 1 2) ==> 3

> (print-val (cons 'a '(1 2 3)))
(cons 'a '(1 2 3)) ==> (a 1 2 3)

> (define x 14)
> (print-eval x)
x ==> 14

> (print-eval even?)
even? ==> #<procedure:even?>

This macro could be useful for simple debugging, or maybe in a program that makes quiz question that helps beginners evaluate Racket expressions.

Notice that in the call (print-val (+ 1 2)), (+ 1 2) is not quoted. If it was, we’d get this:

> (print-val '(+ 1 2))
'(+ 1 2) ==> (+ 1 2)

Here’s one more example that prints a message before evaluating an expression:

(define-syntax-rule (noisy expr)
  (begin
    (printf "calling ~a ...\n" (quote expr))
    expr))

> (noisy (filter odd? '(1 2 3 4 5)))
calling (filter odd? '(1 2 3 4 5)) ...
'(1 3 5)

The form (begin expr_1 expr_2 ... expr_n) evaluates each of expr_1 to expr_n in the order they occur, and then returns the value of expr_n.

Basic Macro Expansion

It’s often useful to trace how macros get called. For example, when we call (assert (= 1 2)), we can imagine that it gets expanded into this expression:

(when (not (= 1 2))
  (error 'assert "assertion failed: ~s" (quote (= 1 2))))

This expression then gets evaluated to give the final result of calling (assert (= 1 2)).

If you look at the body of the assert macro, you can see that the expanded form is essentially a search-and-replace operation, i.e. every occurrence of expr gets replaced with (= 1 2).

In general, a macro call always goes through this two-step procedure: first expansion, and then evaluation of the expanded form.

Hygienic Macros

Suppose we wanted to write our own or macro. For simplicity, it will take exactly two inputs (Racket or takes 0 or more inputs). Here’s a first attempt:

(define-syntax-rule (my-or-bad e1 e2)
  (if e1 e1 e2))

Logically, this is correct: it returns true if e1 or e2, or both, evaluate to #t. It is even short-circuited, i.e. if e1 is #t then e2 is not evaluated.

However, it has a fatal flaw: e1 might be evaluated twice. Not only is that inefficient, but it could be incorrect when e1 has side-effects (such as printing to the screen or opening a file).

To fix this problem, we can use let to ensure that e1 is evaluated at only once:

(define-syntax-rule (my-or e1 e2)
  (let ([x e1])
    (if x x e2)))

This works!

What’s interesting is that if you think about how this macro expands, you might think it should not work in some cases. For example, consider this:

> (define x 5)       ;; global x
> (my-or #f (= x 5))
#t

(my-or #f (= x 5)) would seem to expand to this:

(let ([x #f])
  (if x x (= x 5)))

When this form is evaluated, each x in the if statement is replaced by #f, and so returns the same value as:

(if #f #f (= #f 5))  ;; error!

This returns the value of (= #f 5), which is an error (= only works with numbers).

So it seems (my-or (= x 1) (= x 5)) should be an error. But it’s not: it correctly returns #t.

What’s happening here is that Racket is renaming the x variable in the assert macro to be a name different than any other variable in expr. It does this to avoid the kind of problem we just saw, i.e. a local variable in a macro accidentally “capturing” a passed-in variable with the same name.

Since the macro system in Racket automatically does this renaming, Racket macros are called hygienic macros. With hygienic macros, you never need to worry about variable name clashes.

How does Racket get a new variable name that is not used anywhere else? Conceptually, you can imagine that Racket creates unique variable name using the gensym function:

> (gensym)
'g12792

> (gensym)
'g12808

Every time gensym is called, it returns a new uninterned symbol, i.e. a symbol that is not yet used anywhere in Racket and thus is guaranteed to be unique. So the local x in my-or can be thought of as being renamed to a new gensysm-created variable.

Example: Non-hygienic Macros in C++

In C++, for example, the macros are not hygienic. Consider this example (borrowed from this Wikipedia article about macro hygiene):

#define INCI(i) do { int a=0; ++i; } while(0)

int main(void)
{
    int a = 4, b = 8;
    INCI(a);
    INCI(b);
    printf("a is now %d, b is now %d\n", a, b);
}

The pre-processor expands the INCI macros giving this code:

int main(void)
{
    int a = 4, b = 8;
    do { int a=0; ++a; } while(0);  // oops: a defined in main is never changed!
    do { int a=0; ++b; } while(0);
    printf("a is now %d, b is now %d\n", a, b);
}

// a is now 4, b is now 9

Since INCI happens to use a as its index variable, the first do-loop doesn’t actually modify the a variable defined in main. Of course, you could fix this by renaming the variable a in INC to, say, i. But then you’ll get the same problem if you call INC(i).

This is a tricky problem, and if you are curious about some of the non-hygienic solutions that have been proposed for solving it, take a look at the Wikipedia article on hygienic macros.