Spelling Checker Article Index for
Spelling
Website Links For
Spell
 

Information About

Spelling Checker







I have a spelling checker,

It came with my PC.

It plane lee marks four my revue

Miss steaks aye can knot sea.

Eye ran this poem threw it,

Your sure reel glad two no.

Its vary polished in it's weigh.
m
My checker tolled me sew.

A checker is a bless sing,

It freeze yew lodes of thyme.

It helps me right awl stiles two reed,

And aides me when eye rime.

Each frays come posed up on my screen

Eye trussed too bee a joule.

The checker pours o'er every word

To cheque sum spelling rule.


An ordinary spell checker will find little or no fault with this poem . This is because spell checkers can only check if words are spelled correctly, not if the sentence makes sense.





SPELL CHECKER OPERATION

Simple spell checkers operate at the word level, by comparing each word in a given input against a Vocabulary (often erroneously referred to as a Dictionary ). If the word is not found within the vocabulary, it is designated erroneous, and algorithms may be run to detect which word the user most likely meant to type. One simple such algorithm is listing words from the dictionary with a small Levenshtein Distance from the typed word.

Spell checkers can operate as the user enters text, notifying the user when an error is made (usually by underlining the erroneous text). They can also operate at the user's request, checking an entire document or Email at once. A Word Processor will typically offer both modes of operation.

Many spell checkers can operate in more than one language. There are many cases in which a user may intentionally type a word which is not within the vocabulary of the language in which the spell checker is operating; proper nouns and Acronyms are two common examples. To solve this problem, most spell checkers allow the user to add custom words to the spell checker's vocabulary. Usually the user also has the option to ignore specific errors.


DESIGN

As already outlined, a spell checker customarily consists of two parts:
# A set of routines for scanning text and extracting words, and
# A wordlist (the vocabulary; often referred to as a dictionary) against which the words found in the text are compared.

The scanning routines sometimes include language-dependent algorithms for handling Morphology . Even for a lightly inflected language like English , word extraction routines will need to handle such phenomena as Contraction s and Possessive s. It is unclear whether morphological analysis provides a significant benefit.
{Link without Title}

The wordlist might simply be a list of words, or it might also contain additional information, such as hyphenation points or lexical and grammatical attributes.

As an adjunct to these two components, the program's user interface will allow users to approve replacements and modify the program's operation.

One exception to the above paradigm are spell checkers which use solely statistics, such as N-gram s, but these have never caught on. In some cases spell checkers use a fixed list of misspellings and suggestions for those misspellings; this less flexible approach is often used in paper-based correction methods, such as the ''see also'' entries of encyclopedias.


HISTORY

The first spell checkers were widely available on mainframe computers in the late 1970s . The first spell checkers for personal computers appeared for CP/M computers in 1980, followed by packages for the IBM PC after it was introduced in 1981 . Developers such as Maria Mariani, Soft-Art, Microlytics, Proximity, Circle Noetics, and Reference Software rushed OEM packages or end-user products into the rapidly expanding software market, primarily for the PC but also for Apple Macintosh , VAX , and Unix . On the PCs, these spell checkers were standalone programs, many of which could be run in TSR mode from within word-processing packages on PCs with sufficient memory.

However, the market for standalone packages was short-lived, as by the mid 1980s developers of popular word-processing packages like WordStar and WordPerfect had incorporated spell checkers in their packages, mostly licensed from the above companies, who quickly expanded support from just English to European and eventually even Asian Language s. However, this required increasing sophistication in the morphology routines of the software, particularly with regard to heavily-inflected languages like Hungarian and Finnish . Although the size of the word-processing market in a country like Iceland might not have justified the investment of implementing a spell checker, companies like WordPerfect nonetheless strove to localize their software for as many as possible national markets as part of their global Marketing strategy.

Recently, spell checking has moved beyond word processors as Firefox 2.0, a Web Browser , has spell check support for user-written content, such as when writing on many Webmail sites, Blogs , and Social Networking websites. The web browser Opera and the Instant Messaging Client Pidgin also offer spell checking support, transparently using GNU Aspell as their engine.


FUNCTIONALITY

The first spell checkers were "verifiers" instead of "correctors." They offered no suggestions for incorrectly spelled words. This was helpful for Typos but it was not so helpful for logical or phonetic errors. The challenge the developers faced was the difficulty in offering useful suggestions for misspelled words. This requires reducing words to a skeletal form and applying pattern-matching algorithms.

It might seem logical that where spell-checking dictionaries are concerned, "the bigger, the better," so that correct words are not marked as incorrect. In practice, however, an optimal size for English appears to be around 90,000 entries. If there are more than this, incorrectly spelled words may be skipped because they are mistaken for others. For example, a linguist might determine in the basis of Corpus Linguistics that the word '' Baht '' is more frequently a misspelling of ''bath'' or ''bat'' than a reference to the Thai currency. Hence, it would typically be more useful if a few people who write about Thai currency were slightly inconvenienced, than if the spelling errors of the many more people who discuss baths were overlooked.

The first MS-DOS spell checkers were mostly used in proofing mode from within word processing packages. After preparing a document, a user scanned the text looking for misspellings. Later, however, batch processing was offered in such packages as Oracle 's short-lived CoAuthor . This allowed a user to view the results after a document was processed and only correct the words that he or she knew to be wrong. When memory and processing power became abundant, spell checking was performed in the background in an interactive way, such as has been the case with the Sector Software produced Spellbound program released in 1987 and Microsoft Word since Word 95.

In recent years, spell checkers have become increasingly sophisticated; some are now capable of recognizing simple Grammatical errors. However, even at their best, they rarely catch all the errors in a text (such as Homonym errors) and will flag Neologism s and foreign words as misspelling.


SPELL-CHECKING OTHER LANGUAGES


English is unusual in that most words used in formal writing have a single spelling that can be found in a typical dictionary, with the exception of some jargon and modified words. In many languages, however, it's typical to frequently combine words in new ways. In German, compound nouns are frequently coined from other existing nouns. Some scripts do not clearly separate one word from another, requiring word-splitting algorithms. Each of these presents unique challenges to non-English language spell checkers.


CONTEXT-SENSITIVE SPELL CHECKERS


Recently, research has focused on developing algorithms which are capable of recognizing a misspelled word, even if the word itself is in the vocabulary, based on the context of the surrounding words . Not only does this allow words such as those in the poem above to be caught, but it mitigates the detrimental effect of enlarging dictionaries, allowing more words to be recognized. The most common example of errors caught by such a system are Homophone errors, such as the bold words in the following sentence:
:Their coming '''too''' '''sea''' if '''its''' '''reel'''.

The most successful algorithm to date is Andrew Golding and Dan Roth 's "winnow-based spelling correction algorithm", published in 1999 , which is able to recognize about 96% percent of context-sensitive spelling errors, in addition to ordinary non-word spelling errors {Link without Title} . Context-sensitive spell checkers are likely to appear in future text-processing products.


SEE ALSO



Spell checkers



EXTERNAL LINKS