Tree Traversal
- Suppose we have a bunch of data stored in a tree data structure.
typedef struct node { struct node *child1; struct node *child2; char *data; } node;
- It will often be binary like that.
- … but could be \(m\)-ary in general.
- And might be a binary search tree (everything in left smaller; in right larger).
- But might not be: we could just store whatever data in whatever child we want.
- For now, let's imagine it's just some (rooted) binary tree:
- … and that we have a pointer to the root node.
- Of course, we could have stored the same data in an array.
- If we did, we would probably have to process each element at some point.
- You have probably written this more times than you can count:
for (int i=0; i<length; i++) { process(array[i]); }
process()
could be any operation: print the contents, add up the values, modify the data,…- We don't care what operation it is, just that we had to touch every element.
- But if we store the values in a tree data structure instead, what is the equivalent?
- If we are given a pointer to the root node, how do we find every element so we can process it?
- Is there a better or worse way to do that?
- If we are given the root, we can visit it, and then recursively visit the subtrees below:
procedure preorder_traverse(root) if root is not null process(root.data) preorder_traverse(root.child1) preorder_traverse(root.child2)
- If we did this on the above tree, we would visit the nodes in this order: \[a, b, d, g, h, c, e, f, i, j\]
- This is called a preorder traversal because we visit the node itself before visiting the children.
- Obviously, if we had an \(m\)-ary tree, we could visit each of the \(m\) children recursively.
- We did get at all of the data in the tree, which was the goal.
- If we can do a preorder traversal, then the obvious next thing to try: a postorder traversal.
- We will visit the node after its children.
- Like this:
procedure postorder_traverse(root) if root is not null postorder_traverse(root.child1) postorder_traverse(root.child2) process(root.data)
- Then we get the data in a different order. In the example: \[g,h,d,b,e,i,j,f,c,a\]
- If we just stored our data randomly in the tree, the order probably doesn't matter: one order might be as good as another.
- Either way, we get at all of the data.
Inorder Traversal
- There's one more place the
process()
step could go: between visiting the children.procedure inorder_traverse(root) if root is not null inorder_traverse(root.child1) process(root.data) inorder_traverse(root.child2)
- An inorder traversal.
- Just another order of the data.
- With our example tree: \[g,d,h,b,a,e,c,i,f,j\]
- But if we started with a binary search tree like this:
- … then an inorder traversal gets the data in a very nice order:
apple, banana, cherry, date, fig, grape, honeydew - It visited the elements in sorted order.
- … then an inorder traversal gets the data in a very nice order:
- It's not too hard to see that that will always be the case.
- Rules of a BST: everything in the left subtree is smaller; everything in the right subtree is bigger.
- So we visit the smaller things first, then the root node, then the larger things.
- We could easily prove by induction (on the number of nodes) that this gets us everything in sorted order.
- If we do it again with another BST:
- We get these items out in this order:
apple, banana, cherry, date, eggplant, fig, grape, honeydew
- We get these items out in this order:
- It's not as obvious what to do if we have more than two children here.
- Usually we don't care: we mostly want to do inorder traversal on BSTs.
- But if we had to define inorder traversal for \(m\)-ary trees, I would visit child 1, then the root, then children 2 to \(m\).
Expression Trees
- Suppose we're looking at an arithmetic expression like this: \[(2-1)-(3+4\times 2)\]
- We could represent that as a rooted binary tree.
- Vertices are operators and numbers.
- The children of operators are the values they're operating on.
- Numbers are leaves.
- For the above expression, we get this:
- We needed to know the order of operations rules to build the tree correctly.
- A compiler builds basically this tree out of your code as it compiles it.
- We could easily write a recursive algorithm to evaluate an expression based on this tree:
procedure evaluate(expr) if expr.is_an_operator operand1 = evaluate(expr.leftchild) operand2 = evaluate(expr.rightchild) return expr.operator applied to operand1 and operand2 else if expr.is_a_number return expr.value
- If we do an in-order traversal of the tree, we get the expression back.
- … if we print a “(” first and “)” last around each operator, so we preserve the order of operations.
- For our sample tree, the output would be: \[((2-1)-(3+(4\times 2)))\]
- Some extra parens, but basically the same thing we started with.
- If we do a preorder traversal of the tree we get this:
- - 2 1 + 3 × 4 2 - Hmmm… doesn't seem useful.
- But if we spell things a little differently, it makes sense.
- Turn “+” into “ADD(”, etc.
- Add a “,” between and “)” after each operator traversal.
- Then we get this:
SUB(SUB(2, 1), ADD(3, MULT(4, 2))) - We turned the infix operator notation into prefix notation for function calls.
- What about postorder traversal?
- On our example tree, we get:
2 1 - 3 4 2 × + - - Also doesn't seem very useful at first glance.
- But it's the postfix notation needed by a stack-based calculator/processor.
- If you don't know what that is, don't worry about it.
- On our example tree, we get:
- So if we build an expression tree, we can preorder/inorder/postorder traverse it to convert between prefix/infix/postfix notations.
- That's one of the reasons a compiler has to build that tree.
- The processor doesn't want to values in the infix order you write in your code.
- But the compiler can build the tree and traverse it to generate the assembly code needed.