ICDM 2003 Tutorial / Information Extraction, Theory and Practice

ICDM 2003 Tutorial

Information Extraction, Theory and Practice

by Ronen Feldman

Abstract

Information Extraction (IE), is one of the most prominent techniques currently used in Text Mining. In particular, by combining Natural Language Processing tools, lexical resources and semantic constraints, it can provide effective modules for mining the biomedical literature, or to help in preventing terrorism. Complementary visualization tools enable the user to explore, check (and correct if required) the results of the Text Mining process effectively.

As a first step in tagging documents, each document is processed to find (extract) Entities and Relationships that are likely to be meaningful and content-bearing. In “Relationships” we refer to Facts or Events involving certain Entities. A possible “Event” may be that a company has entered into a joint venture to develop a new drug. A “Fact” may be that a gene causes a certain disease. Facts are static in nature and usually do not change; events are more dynamic in nature and have a specific time stamp associated with them. The extracted information provides more concise and precise data for the mining process than the more naive word-based approaches such as those used for text categorization, and tends to represent concepts and relationships that are more meaningful and relate directly to the examined document’s domain.

Bio

Ronen Feldman draws on years of experience in the development of knowledge discovery systems and text mining applications. He is responsible for ClearForest's technical business development, rapid prototyping, and the research and development of new products. He serves as a consultant to leading Israeli companies such as IBM, El-Al, Telrad, Bezeq, Israel Electric Company, and the National Coal Company, and serves on the program committees of AAAI, KDD, SDM and ICDM. He is often an invited speaker in academic and industrial conferences, and he is a senior lecturer in the Mathematics and Computer Science Department of Bar-Ilan University in Israel. He received his B.Sc. in Mathematics, Physics and Computer Science from the Hebrew University and his Ph.D. in Computer Science from Cornell University.