Information Extraction Article Index for
Information
Website Links For
Information
 

Information About

Information Extraction




The significance of IE is determined by the growing amount of information available in unstructured (i.e. without Metadata ) form, for instance on the Internet. This knowledge can be made more accessible by means of transformation into Relational Form , or by marking-up with XML tags. An intelligent agent monitoring a news data feed requires IE to transform unstructured data into something that can be reasoned with.

A typical application of IE is to scan a set of documents written in a Natural Language and populate a database with the information extracted. Current approaches to IE use Natural Language Processing techniques that focus on very restricted domains. For example, the '' Message Understanding Conference '' (MUC) is a competition-based conference that focused on the following domains in the past:
  • MUC-1 (1987), MUC-2 (1989): Naval operations messages.

  • MUC-3 (1991), MUC-4 (1992): Terrorism in Latin American countries.

  • MUC-5 (1993): Joint ventures and microelectronics domain.

  • MUC-6 (1995): News articles on management changes.

  • MUC-7 (1998): Satellite launch reports.


Natural Language texts may need to use some form of a Text Simplification to create a more easily machine readable text to extract the sentences.

Typical subtasks of IE are:
  • Named Entity Recognition : recognition of entity names (for people and organizations), place names, temporal expressions, and certain types of numerical expressions.

  • s that refer to the same object. For example, Anaphora is a type of coreference.




SEE ALSO

  • HAREM , a Portuguese named entity recognition contest

  • ECHELON



EXTERNAL LINKS

  • Extracción informacion (Spanish site)

  • http://www.itl.nist.gov/iaui/894.02/related_projects/muc/ MUC

  • http://projects.ldc.upenn.edu/ace/ ACE (LDC)

  • http://www.itl.nist.gov/iad/894.01/tests/ace/ ACE (NIST)

  • http://lcl2.di.uniroma1.it TermExtractor

  • TermFinder , on-line terminology extractor for EN, FR & IT - Web Application



Commercial

  • Document Summary System , The Document Summary System is a commercial product that performs document summarization