Assignment 1

Simon Fraser University

Spring 2004

CMPT 882

Instructor: Oliver Schulte

Due: January 22, 2004

Paper and Pencil: Exercise 2.1, 2.4, [3.2, 3.4] in Mitchell. Note change in assignment: we’ll get to 3.2, 3.4 next week.

Software: One of the purposes of the assignments is for you to get a sense of various standard machine learning technology, by looking at some implementations of the algorithms rather than a general definition. What I’m looking for is some evidence that you've managed to run an algorithm and that you had some understanding of the output you saw. The evidence should be in the form of a print-out of screen displays or program outputs, and usually a brief discussion of the output (things you noticed or that strike you.) Specifically, this week I'd like you to run an implementation of one or more decision tree learners. What follows are some specific instructions and suggestions for experimentation. Feel free to explore and share with us if you find something interesting.

Check Out the Web

Enter in google "decision tree learning". Take a look at what comes up, check out a few links. There are quite a few web pages for machine learning in general; a starting point could be Russ Greiner’s resource page.

Decision Trees

One site with a working Java applet is the AI Exploratorium. You'll need a Java plug-in. I had to search for a bit for it; for a PC, you can get it from Netscape. This demo has a nice range of data sets. It doesn't seem to allow you to experiment with different subsets of the data. If this is your only decision tree experimentation, include at least two runs with different datasets. You should have a brief discussion too with observations that show some understanding of the output. For example, in a decision tree the first attribute is supposed to be the most informative/relevant. Using commonsense intuitions, does the learner succeed in finding an attribute that intuitively seems relevant?
The text has some Lisp code for building decision trees. If you run this, experiment with some of the data. It's easy to find the data in the code - try running the program with different subsets. What happens if you give the tree constructor only half the data? What happens if you make all the examples positive (or negative)? How about adding some data points?
Utgoff has an implementation of some of his ideas (see http://www.cs.umass.edu/~lrn/iti/). I haven't installed this yet, but it would be interesting to see how his incremental decision tree induction does. You qualify as using this for "individual research", so feel free to download the software. If you cannot make it run in reasonable time (half an hour), don't worry about it. Song Gao reports difficulties compiling at SFU; the basic trouble seems to be using the cc compiler rather than gcc.
Kimberley Voll and Aaron Hunter recommend the Weka suite of machine learning tools, which is Java-based and includes a decision-tree learner.
Glendon Holst points out that the decision tree learner C4.5 is implemented at http://www.cs.uregina.ca/~dbd/cs831/notes/ml/dtrees/c4.5/tutorial.html.Changing calls of cfree() to free() allowed the code to compile on SFU unix machines.