ICDM 2003 Tutorial / Bioinformatics and Machine Learning Methods

ICDM 2003 Tutorial

Bioinformatics and Machine Learning Methods

by Chris Ding

Abstract

Many problems in genomics are solved or being solved by data mining methods. These include cancer prediction, gene finding, protein structures and functions, protein interactions, gene regulation networks, among many other problems. The tutorial provides a comprehensive introduction to these problems with emphasis on biology and relevant machine learning methods. A survey of useful classification and clustering methods is provided with emphasis on current research trends. Particularly relevant techniques are discussed in details; topics include hidden Markov models, feature extraction and selection, and biclustering. The fast growing research area of biological networks is discussed at length; topics include network structure deduction, Bayesian networks, Linear models, connectivity analysis using graph methods, DNA motif detection, multi-protein complexes identification, etc. Common genomic data types are explained, including microarray gene expressions, protein structure data, two-hybrid protein interaction, and many useful databases. Intended audience are people with data mining background who wish to do research in bioinformatics. Six bioinformatics problems with rational, methods and related data, are provided to help one get started quickly in this field.

Bio

http://www.nersc.gov/~cding