Many problems in genomics are solved or
being solved by data mining methods. These include
cancer prediction,
gene finding, protein structures and functions,
protein interactions, gene regulation networks, among
many other problems.
The tutorial provides a comprehensive introduction to these
problems with emphasis on biology
and relevant machine learning methods.
A survey of useful classification and clustering
methods is provided with emphasis on current research trends.
Particularly relevant techniques are discussed in details; topics
include hidden Markov models, feature extraction and selection,
and biclustering.
The fast growing research area of biological networks is
discussed at length; topics include network structure deduction,
Bayesian networks, Linear models,
connectivity analysis using graph methods, DNA motif detection,
multi-protein complexes identification, etc.
Common genomic data types are explained, including
microarray gene expressions, protein structure data,
two-hybrid protein interaction, and many useful databases.
Intended audience are people with data mining
background who wish to do research in bioinformatics.
Six bioinformatics problems with rational, methods and related data,
are provided to help one get started quickly in this field.
Bio
Chris Ding is a staff computer scientist at
Lawrence Berkeley National Laboratory. He obtained a Ph.D.
from Columbia University and worked previously in
California Institute of Technology and Jet Propulsion Laboratory.
He started work on biomolecule simulations in 1992
and computational genomics research in 1998.
He is the first to use Support Vector Machines for
protein 3D structure prediction.
He's written 8 bioinformatics papers and also published
extensively on data mining, text and Web analysis.
He's gave invited seminars at Stanford and UC Berkeley,
and many conferences, workshops and panel discussions.
The tutorial grows out of his research experiences
in the area. More details: http://www.nersc.gov/~cding