Wordnet Website Links For
Wordnet
 

Information About

Wordnet




WordNet was created and is being maintained at the Cognitive Science Laboratory of Princeton University under the direction of Psychology Professor George A. Miller . Development began in 1985 . Over the years, the project received about $3 million of funding, mainly from government agencies interested in Machine Translation .


DATABASE CONTENTS


As of 2005 , the database contains about 150,000 words organized in over 115,000 synsets for a total of 203,000 word-sense pairs; in Compressed form, it is about 12 Megabyte s in size.

WordNet distinguishes between nouns, verbs, adjectives and adverbs because they follow different grammatical rules. Every synset contains a group of synonymous words or Collocation s (a ''collocation'' is a sequence of words that go together to form a specific meaning, such as " Car Pool "); different senses of a word are in different synsets. The meaning of the synsets is further clarified with short defining ''glosses''. A typical example synset with gloss is:

: good, right, ripe -- (most suitable or right for a particular purpose; "a good time to plant tomatoes"; "the right time to act"; "the time is ripe for great sociological changes")

Most synsets are connected to other synsets via a number of semantic relations. These relations vary based on the type of word, and include:
  • Noun s

  • ---'' Hypernym s'': ''Y'' is a hypernym of ''X'' if every ''X'' is a (kind of) ''Y''

  • ---'' Hyponym s'': ''Y'' is a hyponym of ''X'' if every ''Y'' is a (kind of) ''X''

  • ---''coordinate terms'': ''Y'' is a coordinate term of ''X'' if ''X'' and ''Y'' share a hypernym

  • ---'' Holonym '': ''Y'' is a holonym of ''X'' if ''X'' is a part of ''Y''

  • ---'' Meronym '': ''Y'' is a meronym of ''X'' if ''Y'' is a part of ''X''

  • Verb s

  • ---''hypernym'': the noun ''Y'' is a hypernym of the verb ''X'' if the activity ''X'' is a (kind of) ''Y''

  • ---''coordinate terms'': those verbs sharing a common hypernym

  • Adjective s

  • ---''related nouns''

  • ---''participle of verb''

  • Adverb s

  • ---''root adjectives''


While semantic relations apply to all members of a synset because they share a meaning and are all mutually Synonym s, words can also be connected to other words by lexical relations, including Antonym s (opposites of each other) and derivationally related words.

WordNet also provides the ''polysemy count'' of a word: the number of synsets that contain the word. If a word participates in several synsets (i.e. has several senses), then typically some senses are much more common than others. WordNet quantifies this by the ''frequency score'': in several sample texts all words were semantically tagged with the corresponding synset, and then it was counted how often a word appeared in a specific sense.

The morphology functions of the software distributed with the database try to deduce the Lemma or Root form of a Word from the user's input; only the root form is stored in the database unless it has irregular inflected forms.


KNOWLEDGE STRUCTURE

Both nouns and verbs are organized into hierarchies, defined by hypernym or ''IS A'' relationships. For instance, the sense 1 of the word ''dog'' would have the following hypernym hierarchy; the words on the same level are synonyms of each other: some sense of ''dog'' is synonymous with some other senses of ''domestic dog'' and ''Canis familiaris'', and so on. Each set of synonyms, also known as a synset, has a unique index and share their properties, such as gloss (or dictionary) definition.

dog, domestic dog, Canis familiaris
=> canine, canid
=> carnivore
=> placental, placental mammal, eutherian, eutherian mammal
=> mammal
=> vertebrate, craniate
=> chordate
=> animal, animate being, beast, brute, creature, fauna
=> ...

At the top level, these hierarchies are organized in to 25 primitive groups for nouns, and 15 for verbs. These groups form ''lexicographic files'' at maintenance level.

In the case of adjectives, the organization is different. Two opposite 'head' senses work as binary poles, while 'satellite' synonyms connect to each of the heads via synonymy relations. Thus, the hierarchies, and the concept of lexicographic files, do not apply here the same way they do for nouns and verbs.


LIMITATIONS


Unlike other dictionaries, WordNet does not include information about Etymology , pronunciation and the forms of Irregular Verb s and contains only limited information about usage.

The actual lexicographical and semantical information is maintained in ''lexicographer files'', which are then processed by a tool called ''grind'' to produce the distributed database. Both grind and the lexicographer files are freely available, but modifying and maintaining the database is nonetheless difficult.

Because it groups similar words together under a single, general definition, the definitions Wordnet provides for most individual words are not accurate.


RELATED PROJECTS


The project EuroWordNet has produced WordNets for several European languages and linked them together; these are not freely available however. The Global Wordnet project attempts to coordinate the production and linking of wordnets for all languages. Oxford University Press , the publishers of the Oxford English Dictionary have voiced plans to produce their own online WordNet.

The EXtended WordNet is a project at the University Of Texas At Dallas which aims to improve WordNet by semantically parsing the glosses, thus making the information contained in these definitions available for automatic knowledge processing systems. It is also freely available under a license similar to WordNet's.

The GCIDE project produces a dictionary by combining a Public Domain '' Webster's Dictionary '' from 1913 with some WordNet definitions and material provided by volunteers.
It is released under the Copyleft license GPL .

The hypernym/hyponym relationships among the noun synsets can be interpreted as specialization relations between conceptual categories.
In other words, WordNet can be interpreted and used as a lexical
Ontology in the Computer Science
sense. However, such an ontology should normally be corrected before
being used since it contains hundreds of basic semantic inconsistencies
such as (i) the existence of common specializations for exclusive
categories and (ii) redundancies in the specialization hierarchy.
Furthermore, transforming WordNet into a lexical ontology usable for
knowledge representation should normally also involve
(i) distinguishing the specialization relations into subtypeOf and
instanceOf relations, and (ii) associating intuitive unique
identifiers to each category. Although such corrections and
transformations have been performed and documented as part of the
integration of WordNet 1.7 into the cooperatively updatable knowledge base of WebKB-2 ,
most projects claiming to re-use WordNet for knowledge-based
applications (typically, knowledge-oriented information retrieval)
simply re-use it as such.

WordNet is also commonly re-used via mappings between the WordNet
categories and the categories from other ontologies. Most often, only
the top-level categories of WordNet are mapped. However, the authors
of the SUMO ontology have
produced a mapping between all of the WordNet synsets, (including nouns, verbs, adjectives and adverbs), and
SUMO Class es. The most recent addition of the mappings provides links
to all of the more specific terms in the MId-Level Ontology (MILO), which
extends SUMO.
The OpenCyc upper ontology is also linked to some of WordNet.

In most works that claim to have integrated WordNet into other
ontologies, the content of WordNet has not simply been corrected when
semantic problems have been encountered; instead, WordNet has been
used as an inspiration source but heavily re-interpreted and
updated whenever suitable. This was the case when, for example, the
top-level ontology of WordNet was re-structured according to the OntoClean based approach
or when WordNet was used as a primary source for constructing the
lower classes of the SENSUS ontology.

FrameNet is a project similar to WordNet. It consists of a lexicon which is based on annotating over 100,000 sentences with their semantic properties. the unit in focus is the ''lexical frame'', a type of state or event together with the properties associated with it.


SEE ALSO:



EXTERNAL LINKS: