Speech Recognition Article Index for
Speech
Articles about
Speech Recognition
Website Links For
Speech Recognition
 

Information About

Speech Recognition




Speech recognition applications that have emerged over the last few years include voice dialing (''e.g.'', "Call home"), call routing (''e.g.'', "I would like to make a collect call"), simple data entry (''e.g.'', entering a credit card number), preparation of structured documents (e.g., a radiology report), Domotic appliances control and content-based spoken audio search (''e.g.'' find a podcast where particular words were spoken).

Voice recognition or Speaker Recognition is a related process that attempts to identify the person speaking, as opposed to what is being said.


PERFORMANCE OF SPEECH RECOGNITION SYSTEMS

The performance of a speech recognition systems is usually specified in terms of accuracy and speed. Accuracy is measured with the Word Error Rate , whereas speed is measured with the Real Time Factor .

Most speech recognition users would tend to agree that dictation machines can achieve very high performance in controlled conditions. Part of the confusion mainly comes from the mixed usage of the terms "speech recognition" and "dictation".

Speaker-dependent dictation systems requiring a short period of training can capture continuous speech with a large vocabulary at normal pace with a very high accuracy. Most commercial companies claim that recognition software can achieve between 98% to 99% accuracy (getting one to two words out of one hundred wrong) if operated under optimal conditions. These optimal conditions usually means the test subjects have:
  • matching speaker characteristics with the training data,

  • proper speaker adaptation, and

  • clean environment (e.g. office space).


This explains why some users, especially those whose speech is heavily accented, might actually perceive the recognition rate to be much lower than the expected 98% to 99%. Speech recognition in video has become a popular search technology used by several video search companies.

Limited vocabulary systems, requiring no training, can recognize a small number of words (for instance, the ten digits) as spoken by most speakers. Such systems are popular for routing incoming phone calls to their destinations in large organizations.

Both Acoustic Modelling and Language Modelling are important studies in modern statistical speech recognition. In this entry, we will discuss the use of hidden Markov model (HMM) which is widely used in many systems. ( Language Modelling has many other applications such as Smart Keyboard and Document Classification ; to the corresponding entries.)