| Full-text Search |
Website Links For Full Text |
Information AboutFull-text Search |
| CATEGORIES ABOUT FULL TEXT SEARCH | |
| searching | |
| text editor features | |
|
In Text Retrieval , full text search (also called '''free search text''' ) refers to a technique for searching a Computer -stored Document or Database ; in a full text search, the Search Engine examines all of the words in every stored document as it tries to match search words supplied by the user. Full-text searching techniques became common in online Bibliographic Database s in the 1970s . Most Web sites and application programs (such as Word Processing software) provide full text search capabilities. Some Web search engines, such as AltaVista employ full text search techniques, while others index only a portion of the Web pages examined by its indexing system.In practice, it may be difficult to determine how a given search engine works. The Search Algorithm s actually employed by Web search services are seldom fully disclosed out of fear that Web entrepreneurs will use Search Engine Optimization techniques to improve their prominence in retrieval lists. INDEXING When dealing with a small number of documents it is possible for the full-text search engine to directly scan the contents of the documents with each Query , a strategy called Serial Scanning . This is what some rudimentary tools, such as Grep , do when searching. However, when the number of documents to search is potentially large or the quantity of search queries to perform is substantial the problem of full text search is often divided into two tasks: indexing and searching. The indexing stage will scan the text of all the documents and build a term Index or Concordance . The search stage, when performing a specific query, then only references the index rather than the text of the original documents. The indexer will make an entry in the index for each term or word found in a document and possibly it's relative position within the document. Usually the indexer will ignore Stop Words , such as the English "the", which are both too common and carry too little meaning to be useful for searching. Some indexers also employ language-specific Stemming on the words being indexed, so for example any of the words "drives", "drove", or "driven" will be recorded in the index under a single concept word "drive". THE PRECISION VS. RECALL TRADEOFF Due to the ambiguities of searching solves this problem by Tagging the documents in such a way that the ambiguities are eliminated. However, a controlled vocabulary search may have low recall: it may fail to retrieve some documents that are actually relevant to the search question. Despite the presence of many irrelevant documents in a free text search's retrieval list, a free text search may be able to locate a document that a controlled vocabulary search failed to retrieve. THE FALSE POSITIVE PROBLEM As anyone who has performed a free text search will readily recognize, free text searching is likely to retrieve many documents that are not Relevant to the ''intended'' search question. Such documents are called False Positive s. The retrieval of irrelevant documents is often caused by the inherent ambiguity of Natural Language ; for example, the word '' Football '' might refer either to Soccer , American , Canadian , Gaelic or Australian Rules football, etc., whereas the person searching is probably interested in only one of these. Certain clustering techniques based on Bayesian algorithms (similar to spam filter in Google) can help reduce the false positive errors. So if in the above example the search term is "football", these techniques can categorize the document/data universe into say "American football", "corporate football" etc. Depending on the occurrences words in a document, it can fall into one of the categories or more. This is kind of one step beyond the full text search. These techniques are being extensively deployed in the e-discovery domain. {Link without Title} IMPROVING THE PERFORMANCE OF FULL TEXT SEARCHING The deficiencies of free text searching have been addressed in two ways: By providing users with tools that enable them to express their search questions more precisely, and by developing new search algorithms that improve retrieval precision. Improved querying tools
Improved search algorithms Technological advances have greatly improved the performance of free text searching. For example, Google's PageRank algorithm gives more prominence to documents to which other Web pages have linked. This algorithm dramatically improves users' perception of search Precision , a fact that explains its popularity among Internet users. See Search Engine for additional examples. Text retrieval software The following is a partial list of available software products whose predominant purpose is to perform full text indexing and searching. Some of these are accompanied with detailed descriptions of their theory of operation or internal algorithms, which can provide additional insight into how full text search may be accomplished. NOTES SEE ALSO
|
|
|