Formal Language Article Index for
Formal
Website Links For
Formal
 

Information About

Formal Language




In Mathematics , Logic , and Computer Science , a formal language is a language that is defined by precise mathematical or machine processable formulas. Like languages in Linguistics , formal languages generally have two aspects:

  • the Syntax of a language is what the language looks like (more formally: the set of possible expressions that are valid utterances in the language)

  • the Semantics of a language are what the utterances of the language mean (which is formalized in various ways, depending on the type of language in question)



FORMAL LANGUAGE THEORY


The branch of mathematics and computer science which studies exclusively the theory of language syntax is known as formal language theory. In formal language theory, a language is nothing more than its syntax; questions of semantics are not addressed in this specialty.


Elementary example


In mathematics, a formal language consists of two parts, an ''alphabet'' and rules of ''syntax''. The ''alphabet'' is any set of symbols; the rules of ''syntax'' are rules that a String of these symbols must follow if it is to be considered part of the formal language.

As a simple example, consider the ''alphabet'' {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, +, =} together with the following rules of ''syntax'':

  • Every string that does not contain + or = and does not start with 0 is in the language.

  • The string consisting of 0 is in the language.

  • A string containing = is in the language if and only if there is exactly one =, and it separates two strings in the language.

  • Any number of + symbols can appear in a string of the language as long as each of them is surrounded by symbols other than + or =.

  • No string is in the language other than those implied by the previous rules.


Under these rules, the string "23+4=555" is in the language, the string "=234=+" is not. This formal language expresses whole numbers, well-formed addition statements, and well-formed addition equalities, but it expresses only what they look like (their Syntax ), not what they mean ( Semantics ). For instance, nowhere in these rules is it defined that 0 means the number zero, or that + means addition.

The rest of this article is devoted to the basic notions of formal language theory.


Definition and basic terminology


In formal language theory, a language is defined as a possibly infinite set of finite-length sequences of elements drawn from a specified finite set, say \boldsymbol{A}.

The set \boldsymbol{A} is called the ''alphabet'' of the language;
an element of \boldsymbol{A} is called a ''symbol'' or ''character'';
a finite sequence of symbols/characters is called a ''string'';
an element of the language is called a ''word''.

However, the theory is not concerned with applications at all, and therefore, completely neutral to what symbols and words actually stand for.

For instance, in Linguistics , formal language theory can be applied at many different levels of language description simultaneously:

  • in Syntax , which describes how ''words'' in the ''lexicon'' (or ''vocabulary'') combine to form ''sentences'';

  • in Morphology , which describes how ''parts of words'' combine to form ''words'';

  • in Orthography (or Spelling ), which describes how ''characters'' (in the ''alphabet'') combine to form ''words''

  • in Phonology , which describes how ''phonemes'' combine to form ''words''.



Examples


As an example of a formal language, an alphabet might be \left \{ a , b ight \}, and a string over that alphabet might be ababba\,.

A typical language over that alphabet, containing that string, would be the set of all strings which contain the same number of symbols a\, and b\,.

The empty word (that is, length-zero string) is allowed and is often denoted by e\, , \epsilon\, or \Lambda\, . While the alphabet is a finite set and every string has finite length, a language may very well have infinitely many member strings (because the length of words belonging to it may be unbounded).

Some more examples:

  • the set of all words over {a, b}\,

  • the set \left \{ a^{n} ight\}, where n\, is a Natural Number and a^n\, means a\, repeated n\, times

  • Finite languages, such as \{a,b,ab,cba\}\, or \{ac, aca, bba\}\,

  • the set of syntactically correct programs in a given programming language; or

  • the set of inputs upon which a certain Turing Machine halts; or,

  • the set of the longest strings of Alphanumeric ASCII characters on this line, (i.e., the set {the, set, of, longest, strings, of, alphanumeric, ASCII, characters, on, this, line, i, e})



Language specification formalisms


Formal language theory rarely concerns itself with particular languages (except as examples), but is mainly concerned with the study of various types of formalisms to describe languages. For instance, a language can be given as

Typical questions asked about such formalisms include:

  • What is their expressive power? (Can formalism X describe every language that formalism Y can describe? Can it describe other languages?)

  • What is their recognizability? (How difficult is it to decide whether a given word belongs to a language described by formalism X?)

  • What is their comparability? (How difficult is it to decide whether two languages, one described in formalism X and one in formalism Y, or in X again, are actually the same language?)


Surprisingly often, the answer to these decision problems is "it cannot be done at all", or "it is extremely expensive" (with a precise characterization of how expensive exactly). Therefore, formal language theory is a major application area of Computability Theory and Complexity Theory .


Operations on languages


Certain operations on languages are common. This includes the standard set operations, such as union, intersection, and complementation. Another class of operation is the elementwise application of string operations.

Examples: suppose {L}_{1} and {L}_{2} are languages over some common alphabet.
  • The '' Concatenation '' {L}_{1}{L}_{2}\, consists of all strings of the form vw\, where v\, is a string from {L}_{1}\, and w\, is a string from {L}_{2}\,.

  • The ''intersection'' {L}_1 \cap {L}_2 of {L}_{1}\, and {L}_{2}\, consists of all strings which are contained in both languages

  • The ''complement''

  • eg L of a language w.r.t. a given alphabet consists of all strings over the alphabet that are not in the language.


Such operations are used to investigate Closure Properties of classes of languages.
A class of languages is Closed Under a particular operation when the operation, applied to languages in the class, always produces a language in the same class again.
For instance, the Context-free Language s are known to be closed under union, concatenation, and intersection with Regular Language s, but not closed under intersection or complementation.


REFERENCES

  • A. G. Hamilton, ''Logic for Mathematicians'', Cambridge University Press, 1978, ISBN 0 521 21838 1.

  • Michael A. Harrison: Introduction to Formal Language Theory, Addison-Wesley, 1978.

  • Grzegorz Rozenberg, Arto Salomaa, ''Handbook of Formal Languages: Volume I-III'', Springer, 1997, ISBN 3 540 61486 9.

  • Patrick Suppes, ''Introduction to Logic'', D. Van Nostrand, 1957, ISBN 0 442 08072 7.



SEE ALSO



EXTERNAL LINKS




= Drafts of some chapters in the "Handbook of Formal Language Theory", Vol. 1-3, G. Rozenberg and A. Salomaa (eds.), Springer Verlag, (1997):