|
CMPT 740 03-3 Database Systems / Foundations of Data Mining |
Classroom: WMX3255
Lecture: Wednesday and Friday, 11:20am-12:40am
Instructor: |
Martin Ester |
Email: |
|
Office: |
ASB 9907 |
Office Hours: |
Tuesday, 4:00 to 5:00 pm and Thursday, 1:30 to 2:30 p.m. |
Outline
This course is a basic graduate course on data mining. Data mining is the core step in the process of knowledge discovery in databases, i.e. the application of efficient algorithms to find all valid patterns in a database. Assuming a basic knowledge of database systems and statistics, we will cover the foundations of data mining. Lectures will provide the theoretical foundations and the methods, course projects will give you hands-on experience on a selected topic. This is an AREA III course.
Prerequisites
Basic knowledge of database systems (equivalent to CMPT 354) and statistics (equivalent to STAT 270).
Topics
Grading
Grading will be based on the assignments (30 %), the midterm exam (20 %)and the course project (50 %).
Schedule
The tentative schedule is as follows:
Date |
Topic in class |
Assignment |
September 3 |
Introduction |
|
September 5 |
Introduction |
|
September 10 |
Data preprocessing |
|
September 12 |
Data preprocessing |
|
September 17 |
Data mining principles |
Assignment 1, due September 24 |
September 19 |
Data mining principles |
|
September 24 |
Clustering and outlier analysis |
Assignment 2, due October 1 |
September 26 |
Clustering and outlier analysis |
|
October 1 |
Clustering and outlier analysis |
Assignment 3, due October 8 |
October 3 |
Clustering and outlier analysis |
|
October 8 |
Clustering and outlier analysis |
Assignment 4, due October 15 |
October 10 |
Classification and regression |
|
October 15 |
Classification and regression |
Assignment 5, due October 22 |
October 17 |
Classification and regression |
|
October 22 |
Classification and regression |
|
October 24 |
Classification and regression |
|
October 29 |
Midterm exam |
|
October 31 |
Association and frequent pattern analysis |
|
November 5 |
Review of the midterm |
|
November 7 |
Association and frequent pattern analysis |
|
November 12 |
Mining biological data |
|
November 14 |
Mining biological data |
|
November 19 |
No class |
|
November 21 |
No class |
|
November 26 |
Mining text and web data |
|
November 28 |
Conclusions |
|
December 9 |
Project summaries due |
|
December 10, 10:00 am - 03:00 pm, SSCK 8652 |
Data Mining Workshop (with final presentations of the course projects) |
|
Lecture Notes
Additional References
Dorian Pyle: "Data Preparation for Data Mining", Morgan Kaufmann, 1999.
Pavel Berkhin: "Survey Of Clustering Data Mining Techniques", Technical Report,2002.
S. K. Murthy. Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining and Knowledge Discovery, 1997.
P. Cheeseman and J. Stutz. "Bayesian classification (AutoClass): theory and results", In Advances in Knowledge Discovery and Data Mining, U. M. Fayyad, G. PiatetskyShapiro, P. Smyth, R. Uthumsamy (eds.), Cambridge, MA: AAAI/MIT press, pp. 153-180, 1996.
Nici Schraudolph and Fred Cummins: "Introduction to Neural Networks", http://www.inf.ethz.ch/~schraudo/NNcourse/intro.html#content.
Jochen Hipp, Ulrich Güntzer, Gholamreza Nakhaeizadeh: "Algorithms for Association Rule Mining: A General Survey and Comparison", SIGKDD Explorations, 2000.
see lecture slides
see lecture slides
Assignments
Solutions to the assignments (hardcopy!) are due before class on the specified day.
Course Projects
Midterm Exam
Review Criteria
Project Presentation Slides