Spanning Trees

Sometimes, you are given graph, but only need one way to connect nodes.
- For example, a network with redundant connections.
- When routing data, you don't care about redundancy: just need a way to send the data to its destination.
- So if we are looking at routing decisions, we basically want to reduce the graph to a tree. The tree tells usexactly how we will send the data.
- If a link fails, we can choose a different tree.
Given a graph \(G=(V,E)\), a subgraph of \(G\) that is connects all of the vertices and is a tree is called a spanning tree.
- For example, suppose we start with this graph:
- We can remove edges until we are left with a tree: the result is a spanning tree.
- Clearly, a spanning tree will have \(|V|-1\) edges, like any other tree.
Built into Ethernet, there is a Spanning Tree Protocol.
- The job of STP is to take a network with redundant connections and build a spanning tree.
- … which can be used to send data without worrying about forwarding data around loops and bringing down the network.
Theorem: A graph is connected iff it has a spanning tree.

Proof: If a graph is connected, we can identify a cycle and remove an edge from it: it will still be connected. We can continue this until no cycles remain. The result is a spanning tree.
If we have a graph with a spanning tree, then every pair of vertices is connected in the tree. Since the spanning tree is a subgraph of the original graph, the vertices were connected in the original as well.∎

Minimum Spanning Trees

If we just want a spanning tree, any \(n-1\) edges will do.
- If we have edge weights, we can ask for the spanning tree with the lowest total edge weights.
- This is the minimum spanning tree.
For example, if we add some edge weights, what is the minimum spanning tree?
- Is there a general algorithm to find the MST?
- Actually there are two.
Prim's Algorithm works by building a tree from the minimum weight edge, out.
```
procedure prims_algorithm(V, E)
    Et = {minimum weight edge in E}
    repeat n-2 times
        e = the edge in E that is adjacent to an edge in Et, doesn't form a circuit, and has minimum weight
        Et = Et + {e}
    return (V, Et)
```
- That is, we start with the minimum weight edge as our partial-tree.
- Look for an adjacent edge that we can add (no loops) with minimum weight.
- Keep adding edges until we have \(n-1\) of them.

Kruskal's Algorithm works by building a tree from the minimum weight edge, out.

procedure kruskals_algorithm(V, E)
    Et = {}
    repeat n-1 times
        e = the edge in E that doesn't form a circuit, and has minimum weight
        Et = Et + {e}
    return (V, Et)

The only difference here is that we don't limit our search to edges near the tree we're building.

Both of these algorithms find the MST.
On our example graph with weighted edges…
- Prim's find the edges of the MST in this order: \(fg, df, ad, ab, be, ce\).
- Kruskal's find the edges in this order: \(fg, ad, ce, df, ab, be\).
- It's the same set of edges (so the same MST), just a difference in how we find them.
Running time?
- It depends on how we store the edges and how quickly we can find the minimum-weight ones.
- But let's assume we're clever about it. Then…
- Prim's runs in \(O(|E|+|V|\log |V|)\).
- Kruskal's runs in \(O(|E|\log |V|)\).
- So if we have a lot of edges (\(|E|=O(|V|^2)\)), then Prim's will win.
- If the graph is fairly sparse, Kruskal's will win.
Both of these are examples of greedy algorithms.
- In a greedy algorithm, you make the choice that seems best right now.
- e.g. the minimum weight edge you can find that doesn't form a circuit.
- If a greedy algorithm works, it's probably fast, because you can make an easy choice.
- But it doesn't get the right answer for every problem.
- e.g. travelling salesman: choosing the minimum weight edge will often find a bad overall solution.
- Don't use a greedy algorithm when it's not appropriate.