Trees

A tree is a particular kind of graph.
- … that comes up often.
- … and has some nice properties.
A tree is a connected undirected graph with no cycles.
- For example, the graph we had that modeled a maze was a tree, because there were no cycles in that maze:
- Most of the other graphs we have looked at are not trees:
Theorem: An undirected graph is a tree iff there is exactly one simple path between each pair of vertices.

Proof: If we have a graph \(T\) which is a tree, then it must be connected with no cycles. Since \(T\) is connected, there must be at least one simple path between each pair of vertices.
If there is more than one path between two vertices, then parts of those paths could be joined to form a cycle. Thus, there must be exactly one path.
Now suppose we have a graph \(G\) where there exactly one simple path between vertices. This graph is clearly connected.
If \(G\) contains a simple circuit, then there are two paths between any vertices on that circuit. Thus contradicts the assumption, so there can be no circuits, and \(G\) is a tree.∎
A rooted tree is a tree with one vertex identified as the “root” and all edges directed away from the root.
- Our map tree wasn't rooted, but could have been:.
- It would have made sense to designate the entrance (\(a\)) as the root.
- Then we would have had this:
- Rooted trees are usually drawn with the root at the top, and edges directed down.
- But can be drawn some other way if it makes sense.
- If it's obvious from the context, we can leave out the arrows, and just understand that all edges are oriented downwards.
There is some more terminology for rooted trees that we'll need to use:
- A parent of a vertex is the (unique) vertex that is adjacent and closer to the root.
- A child is a vertex that is adjacent and further away from the root.
- e.g. in the maze rooted tree, \(g\) is the parent of \(k\). Each of \(d\), \(k\), and \(j\) are children of \(g\).
- Every node in a rooted tree (except the root itself) has a unique parent.
- Nodes that are children, or children-of-children, or further down the tree are descendants.
- Similarly, nodes up from another are its ancestors. The root is an ancestor of everything (except itself) .
- Nodes at the bottom with no children are leaves.
- Nodes with children are internal vertices.
- e.g. in the maze tree, the ancestors of \(k\) are \(g,h,e,a\). The leaves are \(c,b,f,k,j,i\). The others are internal.
We can also have leaves in an unrooted tree: all vertices of degree one.
For example, “family trees” are typically rooted trees:
- Even though it's the same graph, it doesn't make as much sense if we draw it like this:
- Admittedly, the parent/child terminology is backwards if we draw the tree like this.
- But it's right for a tree of descendants of somebody:
Our tic-tac-toe graph was almost a rooted tree:
- The empty board would be the root.
- To make it into an actual tree we would have had to:
  - Throw away the disconnected nodes and keep only ones that can be reached by real game play.
  - Let the nodes capture the state of the board as well as plays made to get there: separate the two XO_/_X_/___ nodes and all of the other cases.
- That would make more nodes, but maybe makes sense as a model of a game like that: we might care how we got to a particular state.
- We can model any turn-based game with a rooted tree like this.

Tree Models

We have already seen two things that can be modelled by trees (or graphs that happen to be trees).
Family trees: descendants or ancestors of a person.
- As we discussed above.

Game turns/choices

Each state of the game is a vertex, and each indicates a legal move/play to get to a new state.
We saw it with tic-tac-toe, but it applies equally well to chess, checkers, go (圍棋), …
Even to card games or board games, but it's not usually as useful for those: the random element (shuffling/dice) makes it harder to model/analyze.
To make it a tree, we need to either have states reachable in only one way, or to decide that states include the history of the game but just the current appearance of the board.
The “root” will be the starting state of the game.

Computer Networks

In theory, computer networks can be any graph: as long as things are connected, they can communicate.
- But in practice, most networks you'll see are trees.
- … at least on the scale of a building (or even campus) or smaller.
- Ethernet and wifi are almost the only technologies people use on that scale.
- They are usually set up so that each computer knows its “gateway”.
- The gateway will be the node's parent if we think of it as a tree.
- (Ethernet can handle non-tree configurations, but they aren't common outside of data centres.)
Since everybody has a unique parent, we get a rooted tree.
- … where the root is the connection between this network and the outside world.
- The leaves are computers that people use.
- The non-leaves (interior vertices) are hubs/routers/switches.
Something we know about trees: there is exactly one path between nodes.
- That means there is no fault-tolerance in such a network.
- If a node or edge disappears (because of hardware failure or something), then somebody is disconnected.
- That might not be a great property for a network to have, but it's cheap to build.
- That's why non-tree networks happen in data centres and other infrastructure places: they want redundant connections.
But in the tree, decisions about routing are easy.
- If the message is to anybody but your descendants, send it up.
- … so no/little computing power needed to make routing decisions.
- Again, it's cheaper to build an easier to configure.

Data Structures

If we have any data structure in a programming language with pointers/references, we can think of it as a graph.
```
typedef struct node {
    struct node *parent;
    char *data;
} node;
```
There's a pretty good chance we won't allow cycles in the graph: it makes the data structure harder to work with.
- We can't just traverse the data structure in the obvious way, or we'd enter an infinite loop.
- So, we get a rooted tree.
The above struct assumes we're keeping pointers to the parent, but that might not be what we want.
- It would be much easier to keep a pointer to the root node, and be able to follow pointers from there to the other data.
So, we'd end up with something like this:
```
typedef struct node {
    struct node *child1;
    struct node *child2;
    struct node *child3;
    char *data;
} node;
```
- Then if we keep a pointer to the root, we an find anything else in the tree by following some chain of pointers.
- … somehow. There's a lot more to say about that.

Tree Properties

Hopefully you're convinced that trees are good for something.
- We can have a look at some properties and learn about the above situations.
Theorem: A connected graph with \(n\) vertices has \(n-1\) edges if and only if it is a tree.

Proof: [Tree implies \(n-1\) edges] For \(n=1\), the tree is a single vertex, so there are zero edges.
For a tree with \(n+1\) vertices, we assume for induction that trees with \(n\) vertices have \(n-1\) edges. In a tree with \(n+1\) vertices let \(v\) be a leaf (degree one) vertex. By removing \(v\), we get a tree with \(n\) vertices and thus \(n-1\) edges. When we replace \(v\), we get \(n\) edges.
[\(n-1\) edges implies a tree] Consider a graph \(G\) that is connected with \(n\) vertices and \(n-1\) edges. Suppose for contradiction that \(G\) contains cycles.
Let \(u\) and \(v\) be adjacent vertices in that cycle. If we remove the edge \((u,v)\), then the graph is still connected. We can continue to remove edges from cycles until no cycles remain in the graph. This gives a graph with \(n\) vertices and \(n-1-k\) edges, with \(k\ge 1\).
The resulting graph is a tree since it is connected and contains no cycles. This contradicts the first part of this proof: a tree must have \(n-1\) edges, but this has fewer.∎
Lemma: If any edge is removed from a tree, the result is a disconnected graph.

Proof: Since there is only one path between pairs of vertices in a tree, removing an edge from \(u\) to \(v\) disconnects \(u\) and \(v\).∎
Corollary: A connected graph with \(n\) vertices must have at least \(n-1\) edges.

Proof: Immediate from the above lemma: if we have to add edges to make a tree, then something was discconnected before.∎
So a tree has the smallest possible number of edges for a connected graph.
- Any fewer edges and it will be disconnected.
- But of course, graphs with \(n-1\) vertices can be disconnected.
In the data structure example above, we get a tree where each vertex can have at most three children.
- It's fairly common to want some limit like that.
- A tree where each vertex can have at most \(m\) children is called an m-ary tree.
- Our code implies a 3-ary tree (or ternary tree).
- Much more common: at most two children, or a binary tree.
The code also implies an order for the children: a first, second, and third child.
- Also a common thing to want, particularly in a data structure.
- When the children have an order like this, we'll call it a ordered rooted tree.
Again, binary trees are the most common for things like this (for reasons we'll see).
- For an order binary tree, there are two children, usually called the left child and right child.
- In code:
```
typedef struct node {
    struct node *leftchild;
    struct node *rightchild;
    char *data;
} node;
```
- Or an example drawn as a graph:
- Here, \(b\) has \(d\) as its left child, and \(e\) as its right child.
- We don't necessarily demand that all children be present: \(c\) has a right child, but no left child.
- There will definitely be leaves: \(d,g,h,f\) here.
- In code that means some of our pointers/references will be null.
If we do insist that all nodes either have \(m\) children or none, it is a full m-ary tree.
We can group the vertices in a rooted tree by their distance from the root.
- These groups are called levels.
- The root is at level 0; its children are at level 1; their children are level 2;…
- In the above example, \(d,e,f\) are at level 2.
The height of a tree is the length of the longest path from the root to a leaf.
- e.g. the above tree has height 3; the maze example has height 5.
- That is, the height is the maximum level in a tree.
A tree is called balanced if all of its leaves are at level \(h\) or \(h-1\).
- These trees are balanced:
- The one on the right is balanced, binary, and full.
- These are not balanced:
Theorem: An \(m\)-ary tree with height \(h\) has at most \(m^h\) leaves.

Proof: We can show this by induction on \(h\). For \(h=0\), we have only the root (which is also a leaf) and thus \(1=m^0\) leaves.
For height \(h\), notice that the subtrees below the root are (at most) \(m\) trees of height \(h-1\). By the induction hypothesis they each have at most \(m^{h-1}\) leaves. In total, we have \(m\cdot m^{h-1}=m^{h}\) leaves.∎
Corollary: If we have an \(m\)-ary tree with \(l\) leaves, then it has height \(h\ge \lceil\log_m l\rceil\). If the tree is balanced, it has \(h =\lceil\log_m l\rceil\).
So, if we wanted to, we could construct \(m\)-ary trees that are relatively short.
- That might get us some nice algorithms, if we let it.