Tree Traversal

Suppose we have a bunch of data stored in a tree data structure.
```
typedef struct node {
    struct node *child1;
    struct node *child2;
    char *data;
} node;
```
- It will often be binary like that.
- … but could be \(m\)-ary in general.
- And might be a binary search tree (everything in left smaller; in right larger).
- But might not be: we could just store whatever data in whatever child we want.
- For now, let's imagine it's just some (rooted) binary tree:
- … and that we have a pointer to the root node.
Of course, we could have stored the same data in an array.
- If we did, we would probably have to process each element at some point.
- You have probably written this more times than you can count:
```
for (int i=0; i<length; i++) {
    process(array[i]);
}
```
- process() could be any operation: print the contents, add up the values, modify the data,…
- We don't care what operation it is, just that we had to touch every element.
But if we store the values in a tree data structure instead, what is the equivalent?
- If we are given a pointer to the root node, how do we find every element so we can process it?
- Is there a better or worse way to do that?
If we are given the root, we can visit it, and then recursively visit the subtrees below:
```
procedure preorder_traverse(root)
    if root is not null
        process(root.data)
        preorder_traverse(root.child1)
        preorder_traverse(root.child2)
```
- If we did this on the above tree, we would visit the nodes in this order: \[a, b, d, g, h, c, e, f, i, j\]
- This is called a preorder traversal because we visit the node itself before visiting the children.
- Obviously, if we had an \(m\)-ary tree, we could visit each of the \(m\) children recursively.
- We did get at all of the data in the tree, which was the goal.
If we can do a preorder traversal, then the obvious next thing to try: a postorder traversal.
- We will visit the node after its children.
- Like this:
```
procedure postorder_traverse(root)
    if root is not null
        postorder_traverse(root.child1)
        postorder_traverse(root.child2)
        process(root.data)
```
- Then we get the data in a different order. In the example: \[g,h,d,b,e,i,j,f,c,a\]
- If we just stored our data randomly in the tree, the order probably doesn't matter: one order might be as good as another.
- Either way, we get at all of the data.

Inorder Traversal

There's one more place the process() step could go: between visiting the children.
```
procedure inorder_traverse(root)
    if root is not null
        inorder_traverse(root.child1)
        process(root.data)
        inorder_traverse(root.child2)
```
- An inorder traversal.
- Just another order of the data.
- With our example tree: \[g,d,h,b,a,e,c,i,f,j\]
But if we started with a binary search tree like this:
- … then an inorder traversal gets the data in a very nice order:
  apple, banana, cherry, date, fig, grape, honeydew
- It visited the elements in sorted order.
It's not too hard to see that that will always be the case.
- Rules of a BST: everything in the left subtree is smaller; everything in the right subtree is bigger.
- So we visit the smaller things first, then the root node, then the larger things.
- We could easily prove by induction (on the number of nodes) that this gets us everything in sorted order.
If we do it again with another BST:
- We get these items out in this order:
  apple, banana, cherry, date, eggplant, fig, grape, honeydew
It's not as obvious what to do if we have more than two children here.
- Usually we don't care: we mostly want to do inorder traversal on BSTs.
- But if we had to define inorder traversal for \(m\)-ary trees, I would visit child 1, then the root, then children 2 to \(m\).

Expression Trees

Suppose we're looking at an arithmetic expression like this: \[(2-1)-(3+4\times 2)\]
We could represent that as a rooted binary tree.
- Vertices are operators and numbers.
- The children of operators are the values they're operating on.
- Numbers are leaves.
- For the above expression, we get this:
- We needed to know the order of operations rules to build the tree correctly.
- A compiler builds basically this tree out of your code as it compiles it.

We could easily write a recursive algorithm to evaluate an expression based on this tree:

procedure evaluate(expr)
    if expr.is_an_operator
        operand1 = evaluate(expr.leftchild)
        operand2 = evaluate(expr.rightchild)
        return expr.operator applied to operand1 and operand2
    else if expr.is_a_number
        return expr.value

If we do an in-order traversal of the tree, we get the expression back.
- … if we print a “(” first and “)” last around each operator, so we preserve the order of operations.
- For our sample tree, the output would be: \[((2-1)-(3+(4\times 2)))\]
- Some extra parens, but basically the same thing we started with.
If we do a preorder traversal of the tree we get this:
- - 2 1 + 3 × 4 2
- Hmmm… doesn't seem useful.
- But if we spell things a little differently, it makes sense.
- Turn “+” into “ADD(”, etc.
- Add a “,” between and “)” after each operator traversal.
- Then we get this:
  SUB(SUB(2, 1), ADD(3, MULT(4, 2)))
- We turned the infix operator notation into prefix notation for function calls.
What about postorder traversal?
- On our example tree, we get:
  2 1 - 3 4 2 × + -
- Also doesn't seem very useful at first glance.
- But it's the postfix notation needed by a stack-based calculator/processor.
- If you don't know what that is, don't worry about it.
So if we build an expression tree, we can preorder/inorder/postorder traverse it to convert between prefix/infix/postfix notations.
- That's one of the reasons a compiler has to build that tree.
- The processor doesn't want to values in the infix order you write in your code.
- But the compiler can build the tree and traverse it to generate the assembly code needed.