Information Extraction (IE), is one of the most prominent techniques currently used in Text Mining. In
particular, by combining Natural Language Processing tools, lexical resources and semantic constraints, it can
provide effective modules for mining the biomedical literature, or to help in preventing terrorism. Complementary
visualization tools enable the user to explore, check (and correct if required) the results of the Text Mining
process effectively.
As a first step in tagging documents, each document is processed to find (extract) Entities and Relationships
that are likely to be meaningful and content-bearing. In “Relationships” we refer to Facts or Events involving
certain Entities. A possible “Event” may be that a company has entered into a joint venture to develop a new
drug. A “Fact” may be that a gene causes a certain disease. Facts are static in nature and usually do not change;
events are more dynamic in nature and have a specific time stamp associated with them. The extracted information
provides more concise and precise data for the mining process than the more naive word-based approaches such as
those used for text categorization, and tends to represent concepts and relationships that are more meaningful
and relate directly to the examined document’s domain.
In this tutorial we will present the general theory of Information Extraction and will demonstrate several
systems that use these principles to enable interactive exploration of large textual collections. We will present
a general architecture for information extraction and will outline the algorithms and data structures behind the
systems. Special emphasis will be given to efficient algorithms for very large document collections, tools for
visualizing such document collections, the use of intelligent agents to perform text mining on the internet. The
Tutorial will cover the state of the art in this rapidly growing area of research. Several real world
applications of information extraction will be presented in the areas of business intelligence, competitive
intelligence, bio information, and military intelligence.
Bio
Ronen Feldman draws on years of experience in the development of knowledge discovery systems and text mining
applications. He is responsible for ClearForest's technical business development, rapid prototyping, and the
research and development of new products. He serves as a consultant to leading Israeli companies such as IBM,
El-Al, Telrad, Bezeq, Israel Electric Company, and the National Coal Company, and serves on the program
committees of AAAI, KDD, SDM and ICDM. He is often an invited speaker in academic and industrial conferences, and
he is a senior lecturer in the Mathematics and Computer Science Department of Bar-Ilan University in Israel. He
received his B.Sc. in Mathematics, Physics and Computer Science from the Hebrew University and his Ph.D. in
Computer Science from Cornell University.