Simon Fraser
University
Spring 2004
CMPT 882
Instructor: Oliver Schulte
Due: January 22, 2004
Paper and Pencil: Exercise 2.1, 2.4, [3.2, 3.4] in Mitchell. Note change in assignment: we’ll
get to 3.2, 3.4 next week.
Software: One of the purposes of the assignments is for you to get a sense
of various standard machine learning technology, by looking at some
implementations of the algorithms rather than a general definition. What
I’m looking for is some evidence that you've managed to run an algorithm
and that you had some understanding of the output you saw. The evidence should
be in the form of a print-out of screen displays or program outputs, and usually
a brief discussion of the output (things you noticed or that strike you.)
Specifically, this week I'd like you to run an implementation of one or more
decision tree learners. What follows are some specific instructions and
suggestions for experimentation. Feel free to explore and share with us if you
find something interesting.
Check Out the Web
Enter in google "decision tree learning". Take a look at what
comes up, check out a few links. There are quite a few web pages for machine
learning in general; a starting point could be Russ
Greiner’s resource page.
Decision Trees
- One site with a working Java
applet is the AI
Exploratorium. You'll need a Java plug-in. I had to search for a bit
for it; for a PC, you can get it from Netscape. This demo has
a nice range of data sets. It doesn't seem to allow you to experiment with
different subsets of the data. If this is your only decision tree
experimentation, include at least two runs with different datasets. You
should have a brief discussion too with observations that show some
understanding of the output. For example, in a decision tree the first
attribute is supposed to be the most informative/relevant. Using
commonsense intuitions, does the learner succeed in finding an attribute
that intuitively seems relevant?
- The text has some Lisp
code for building decision trees. If you run this, experiment with
some of the data. It's easy to find the data in the code - try running the
program with different subsets. What happens if you give the tree
constructor only half the data? What happens if you make all the
examples positive (or negative)? How about adding some data points?
- Utgoff has an implementation
of some of his ideas (see http://www.cs.umass.edu/~lrn/iti/).
I haven't installed this yet, but it would be interesting to see how his
incremental decision tree induction does. You qualify as using this for
"individual research", so feel free to download the software. If
you cannot make it run in reasonable time (half an hour), don't worry
about it. Song Gao reports difficulties compiling at SFU; the basic
trouble seems to be using the cc compiler rather than gcc.
- Kimberley Voll and Aaron
Hunter recommend the Weka
suite of machine learning tools, which is Java-based and includes a
decision-tree learner.
- Glendon Holst points out that
the decision tree learner C4.5 is implemented at
http://www.cs.uregina.ca/~dbd/cs831/notes/ml/dtrees/c4.5/tutorial.html.Changing
calls of cfree() to free() allowed the code to compile on SFU unix
machines.