ICDM 2003 Tutorial / Advances in Clustering and Applications

ICDM 2003 Tutorial

Advances in Clustering and Applications

by Alexander Hinneburg and Daniel Keim

Abstract

Cluster analysis is one of the basic techniques which are often applied for analyzing large data sets. Originating from the area of statistics, most cluster analysis algorithms have originally been developed for relatively small data sets. In the early years of KDD research, clustering algorithms have been extended to efficiently work on large data sets. However in the last five years a number of advanced topics related to clustering have been subject of research efforts. The advanced topics originating from several data mining applications include clustering with constraints, projected clustering, outlier detection, interactive clustering, clustering for data streams, database technology for clustering and categorical clustering.

The main goal of the tutorial is to provide an overview of the state-of-the-art in cluster discovery methods for large databases, covering well-known clustering methods from related fields such as statistics, pattern recognition, and machine learning, as well as to discuss the new topics related to clustering. The target audience of the tutorial are newcomers as well as experienced KDD researchers, who are interested in the state-of-the art of cluster discovery methods and applications. The tutorial especially addresses people from academia who are interested in developing new cluster discovery algorithms, and people from industry who want to apply cluster discovery methods in analyzing large databases.

The tutorial is structured as follows: First, we give a brief motivation for clustering from modern data mining applications. We discuss important design decisions and explain the interdependencies with the properties of data. In the second section, we introduce a variety of clustering methods developed in the early years of KDD research. The third section covers a discussion of a large number of advanced topics related to clustering and their impact on applications. Last we present some applications where clustering has been successfully used. The tutorial concludes with a discussion of open problems and future research issues.

Bio

http://www.informatik.uni-halle.de/~hinnebur/

http://dbvis.inf.uni-konstanz.de/~keim/