CMPT 740 03-3

Database Systems / Foundations of Data Mining

Classroom: WMX3255

Lecture: Wednesday and Friday, 11:20am-12:40am
 

Instructor:

Martin Ester

Email:

ester@cs.sfu.ca

Office:

ASB 9907

Office Hours:

Tuesday, 4:00 to 5:00 pm and Thursday, 1:30 to 2:30 p.m.

 

Outline

This course is a basic graduate course on data mining. Data mining is the core step in the process of knowledge discovery in databases, i.e. the application of efficient algorithms to find all valid patterns in a database. Assuming a basic knowledge of database systems and statistics, we will cover the foundations of data mining. Lectures will provide the theoretical foundations and the methods, course projects will give you hands-on experience on a selected topic. This is an AREA III course.

Prerequisites

Basic knowledge of database systems (equivalent to CMPT 354) and statistics (equivalent to STAT 270).

Topics

Grading

Grading will be based on the assignments (30 %), the midterm exam (20 %)and the course project (50 %).

Schedule

The tentative schedule is as follows:

 

Date

Topic in class

Assignment

September 3

Introduction

 

September 5

Introduction

 

September 10

Data preprocessing

 

September 12

Data preprocessing

 

September 17

Data mining principles

Assignment 1, due September 24

September 19

Data mining principles

 

September 24

Clustering and outlier analysis

Assignment 2, due October 1

September 26

Clustering and outlier analysis

 

October 1

Clustering and outlier analysis

Assignment 3, due October 8

October 3

Clustering and outlier analysis

 

October 8

Clustering and outlier analysis

Assignment 4, due October 15

October 10

Classification and regression

 

October 15

Classification and regression

Assignment 5, due October 22

October 17

Classification and regression

 

October 22

Classification and regression

 

October 24

Classification and regression

 

October 29

Midterm exam

 

October 31

Association and frequent pattern analysis

 

November 5

Review of the midterm

 

November 7

Association and frequent pattern analysis

 

November 12

Mining biological data

 

November 14

Mining biological data

 

November 19

No class

 

November 21

No class

 

November 26

Mining text and web data

 

November 28

Conclusions

 

December 9

Project summaries due

 

December 10, 10:00 am - 03:00 pm, SSCK 8652

Data Mining Workshop (with final presentations of the course projects)

 

 

Lecture Notes 

Additional References 

 

Assignments 

Course Projects 

Midterm Exam 

Review Criteria 

Project Presentation Slides