Relevance (information Retrieval) Article Index for
Relevance
Website Links For
Relevance
 

Information About

Relevance (information Retrieval)




In Computer Science , and particularly in Search Engine s, relevance is a numerical score assigned to a search result, representing how well the result meets the information need of the user that issued the search query. In many cases, a result's relevance determines the order in which it is presented to the user.

In academic Information Retrieval , the word ''relevance'' has been used in system evaluation for over forty years, going back to the Cranfield Experiments of the early 1960s. In the relatively new commercial search realm, among Web Search engine companies, Search Engine optimizers, and in the press, the incorrect ''relevancy'' is mistakenly being used more and more instead of the correct ''relevance''. One can often tell from which community an information retrieval practitioner hails, depending on whether he or she uses the correct form of the word. Wikipedia's search facility once exhibited an example of use of the incorrect ''relevancy''.


ALGORITHMS FOR RELEVANCE


In the simplest case, relevance can be calculated by examining how many times a query term appears in a document ( Term Frequency ), possibly combined with how Discriminative that query term is across the searched collection (often called Term Frequency-Inverse Document Frequency ).

Since search engines and other businesses rely upon the accuracy of their results, many additional, more complex algorithms have been developed to estimate result relevance. Many of these algorithms, particularly those used by search engines, are hidden to the public, as a user that knows the details of a search algorithm can artificially boost his own content's ranking.

Relevance calculation is often misinterpreted by the press. For example, it has often been said that when Google burst onto the scene it was miles ahead of its competitors because it, unlike anyone else, ranked web pages by relevance. This is not true since ''everyone'' ranks by relevance. It is just that Google had come up with a fairly new way of estimating relevance, namely PageRank . But even search engines that only use TFIDF rank by relevance.


CLUSTERING AND RELEVANCE


The Cluster Hypothesis in information retrieval says that two documents that are similar to each other have a high likelihood of being relevant to the same information need. Topic clustering, and document filtering algorithms function by grouping ''relevant'' documents together. What is actually meant is that the algorithms are grouping ''similar'' documents together. Two (or more) documents are never relevant to each other. They may be similar to each other, but they are only ever relevant to a user's information need. If there is no user information need, there is no relevance.


REFERENCES