
|
ICDM 2003 Tutorial
Advances in Clustering and Applications
|
by Alexander Hinneburg and Daniel Keim
Abstract
Cluster analysis is one of the basic techniques which are often applied for analyzing large data sets. Originating from the area of statistics, most cluster analysis algorithms have originally been developed for relatively small data sets. In the early years of KDD research, clustering algorithms have been extended to efficiently work on large data sets. However in the last five years a number of advanced topics related to clustering have been subject of research efforts. The advanced topics originating from several data mining applications include clustering with constraints, projected clustering, outlier detection, interactive clustering, clustering for data streams, database technology for clustering and categorical clustering.
The main goal of the tutorial is to provide an overview of the state-of-the-art in cluster discovery methods for large databases, covering well-known clustering methods from related fields such as statistics, pattern recognition, and machine learning, as well as to discuss the new topics related to clustering. The target audience of the tutorial are newcomers as well as experienced KDD researchers, who are interested in the state-of-the art of cluster discovery methods and applications. The tutorial especially addresses people from academia who are interested in developing new cluster discovery algorithms, and people from industry who want to apply cluster discovery methods in analyzing large databases.
The tutorial is structured as follows: First, we give a brief motivation for clustering from modern data mining applications. We discuss important design decisions and explain the interdependencies with the properties of data. In the second section, we introduce a variety of clustering methods developed in the early years of KDD research. The third section covers a discussion of a large number of advanced topics related to clustering and their impact on applications. Last we present some applications where clustering has been successfully used. The tutorial concludes with a discussion of open problems and future research issues.
Bio
Alexander Hinneburg, http://www.informatik.uni-halle.de/~hinnebur/
is working in the areas of data mining, databases and bioinformatics. He developed and published several algorithms and methods in the context of clustering and visual similarity search. He has given tutorials on clustering at SIGMOD'99, KDD'99 and PKDD'00 and has been tutorial chair of the KDD conference in 2002. He served as (external) referee for a number of conferences including VLDB, SIGMOD and InfoVis as well as referee for the journals IEEE TKDE, IEEE PAMI, IEEE TVCG, Kluwer Journal on Data Mining and Knowledge Discovery und ACM TIS.
He received his diploma (equivalent to an MS degree) in Computer Science from the Martin-Luther-University of Halle in 1997 and his Ph.D. in 2003. Currently he is working in the database group of the Martin-Luther-University of Halle, Germany.
Daniel A. Keim, http://dbvis.inf.uni-konstanz.de/~keim/
is working in the area of data mining and information visualization,
as well as similarity search and indexing in multimedia databases.
He has published extensively in these area and has given tutorials
on related issues at several large conferences including Visualization,
SIGMOD, VLDB, and KDD; he has been program co-chair of the KDD
conference in 2002 and of the IEEE Information Visualization Symposia in
1999 and 2000; and he is editor of IEEE Trans. on Knowledge and Data
Engineering, IEEE Trans. on Visualization and Computer Graphics, and
Palgrave's Information Visualization Journal.
Daniel Keim received his diploma (equivalent to an MS degree) in
Computer Science from the University of Dortmund in 1990 and his
Ph.D. in Computer Science from the University of Munich in 1994.
He has been assistant professor in the CS department of the
University of Munich, associate professor in the CS department
of the Martin-Luther-University Halle, and he is currently
full professor and head of the database and visualization group
in the CS department of the University of Constance, Germany.