Language Recognition Chart Article Index for
Language
Website Links For
Language
 

Information About

Language Recognition Chart





CHARACTERS

You can recognize text in a foreign language by looking up characters specific to that language. For some reason this is often more accurate than Language Recognition software, which pays little attention to the characters.



LATIN ALPHABET (POSSIBLY EXTENDED)


Romance Languages

Lots of Latin roots.


French

  • Common words: ''de'', ''la'', ''le'', ''du'', ''des'', ''il'', ''et'';

  • Words ending in ''-x'', especially ''-aux'' or ''-eux'';

  • Many apostrophised contractions, i.e. words beginning with ''l''' or ''d'''

  • Accented letters: ''à â ç è é ê î ô û'', rarely ''ë ï'', but never ''á í ì ó ò ú'', and ''ù'' only in the word ''où''



Jèrriais

  • Common words: ''lé'', ''dé'', ''tchi'', ''ès'', ''i''', ''ch'''

  • "Tch", "dg", "th" and "în" are common character combinations. "ou" is frequently followed by another vowel.

  • Many apostrophised short forms, e.g. words beginning with ''l''', ''d''' or ''r'''. ''é'' frequently alternates with an apostrophe e.g. ''c'mîn''/''quémîn''.



Spanish

  • Characters: ¿ ¡ (inverted question and exclamation marks), ñ

  • Acute accents are used on all vowels; no other accent marks are employed (á, é, í, ó, ú)

  • Some words frequently used: de, el, los, la(s), uno(s), una(s), y

  • Spanish DOESN'T use apostrophised contractions

  • Word endings: -o, -a, -ción, -miento, -dad

  • Angle quotation marks: « » (though "curly-Q" quotation marks are also used); dialogue often indicated by means of dashes



Italian


  • Almost every word ends in a vowel. Exceptions include ''non'', ''il'', ''per'', ''con''.

  • Common one-letter word: ''è''

  • Common word: ''perché''

  • Letter sequences: gli, gn, sci

  • Word endings: -o, -a, -zione, -mento, -tà, -aggio

  • Grave accent (e.g., on à) almost always occurs in the last letter of words.

  • Geminate consonants (''tt'', ''zz'', ''cc'', ''ss'', ''bb'', ''pp'', ''ll'', etc) are frequent.



Catalan

  • Character combination "l·l"

  • Word endings: -o, -a, -es, ció, -tat

  • Word beginning: ll-



Romanian

  • Characters: ă â î ş ţ

  • Common words: şi, de, la, a, ai, ale, alor, cu

  • Word endings: -a, -ă, -u, -ul, -ţie (or -ţiune), -ment, -tate

  • Note that Romanian is sometimes written online with no diacritics, making it harder to identify



Portuguese

  • Common one-letter words: a, à, e, é, o

  • Common two-letter words: ao, as, às, da, de, do, em, os, ou, um

  • Common three-letter words; aos, das, dos, ele, ela, não, por, que, uma, uns

  • Common endings: -ção, -ções, -dade

  • Common digraphs: nh, lh

  • Most singular words end in vowels. Other singular words end in l, m, r, z

  • Plural words end in -s

  • European Portuguese often uses c before ç and t: acção, acto, etc.



Walloon

  • Characters: å, é, è, ê, î, ô, û

  • Common digraphs and trigraphs: ai, ae, én, -jh-, tch, oe, -nn-, -nnm-, xh, ou

  • Common one-letter words: a, å, e, i, t', l', s', k'

  • Common two-letter words: al, ås, li, el, vs, ki, si, pô, pa, po, ni, èn, dj'

  • Common three-letter words: dji, nos, vos, les, ses, nén, rén, bén, pol, tel, mel

  • Common endings: -aedje, -mint, -xhmint, -ès, -ea, -ou, -owe, -yî, -åcion

  • Apostrophes are followed by a space (preferably non breaking one), eg: ''l' ome'' instead of ''l'ome''.



Germanic Languages



English

  • words: ''an'', ''in'', ''on'', ''the'', ''that'', ''is'', ''are'', ''I'' (''should'' always be a capital)

  • letter sequences: ''th'', ''ch'', ''sh'', ''ough'', ''augh''

  • word endings: ''-ing'', ''-tion'', ''-ed'', ''-age'', ''-s'', ''-’s'', ''-’ve'', ''-n’t'', ''-’d''



Dutch

  • letter sequences ''ij'', ''ei'', doubled vowels, ''kw'', ''sch'',

  • words: het, op, en, een, voor (and compounds of voor).

  • word endings: ''-tje'', ''-sje'', ''-ing'', ''-en'', ''-lijk'',

  • at the start of words: z, v, ''ge-''

  • “t/m” common in between two dates, times or numbers (e.g. house numbers), for example “9 t/m 5”



German

  • umlauts (ä, ö, ü), eszet (ß)

  • letter sequences: ''sch'', ''tsch'', ''tz'', ''ss'',

  • common words: ''der'', ''die'', ''das'', ''er'', ''sie'', ''es'', ''ist'', ''und'', ''oder'', ''aber''

  • common endings: ''-en'', ''-er'', ''-ern'', ''-st'', ''-ung'', ''-chen''

  • rare letters: ''y'' (except in loan words)

  • long compound words

  • many capitalised words in the middles of sentences



Swedish

  • common words: ''och'', ''i'', ''att'', ''det'', ''en'', ''som'', ''det'', ''är'', ''av'', ''den'', ''på''

  • long compound words

  • letter sequences: ''stj'', ''sj'', ''skj'', ''tj''



Baltic Languages



Latvian

  • uses Diacritics : ā, č, ē, ģ, ī, ķ, ļ, ņ, ō, ŗ, š, ū, ž

  • does not have letters: Q, W, X, Y

  • extremely rare doubling of Vowels

  • rare doubling of Consonants

  • a period (.) after ordinal numbers, e.g. 2005. gads

  • common words: "ir", "bija", "tika", "es", "viņš"



Slavic Languages



Polish

  • unusual consonant clusters "rz", "sz" , "cz", "prz", "trz";

  • uses : ą , ę , ć , ś , ł , ó , ż , ź

  • words "i", "w";

  • word "się".



Czech

  • visual abundance of letters "ž,š,ů,ě,ř";

  • words "je", "v";

  • to distinguish from Slovak: does not use ä, ľ, ĺ, ŕ or ô.



Slovak

  • visual abundance of letters "ž, š, č";

  • uses : ä, ľ, ĺ, ŕ and ô;

  • typical suffixes: ''-cia'', ''-ť'',

  • to distinguish from Czech: does not use ě, ř or ů;



Celtic Languages



Welsh

  • letters ''Ŵ, ŵ'' unique to Welsh

  • words ''y, yr, yn, a, ac, i, o''

  • letter sequences ''wy, ch, dd, ff, ll, mh, ngh, nh, ph, rh, th, si''

  • letters not used: ''k, q, v, x, z''

  • letter only used rarely, in loanwords: ''j''

  • commonly accented letters: ''â, ê, î, ô, û, ŵ, ŷ''

  • word endings: ''-ion, -au, -wr, -wyr''

  • ''y'' is the most common letter in the language

  • circumflex accent (''^'') is by far the commonest diacritical mark.



Iranian Languages


Kurdish

  • The word "xwe" (oneself, myself, yourself etc.) is highly specific (xw combination) and frequent.

  • kir



Finno-Ugric Languages



Finnish

  • distinct letters ''ä'' and ''ö''; but never ''õ'' or ''ü''

  • common words: ''sinä'', ''on''

  • common endings: ''-nen'', ''-ka''/''-kä'', ''-in''

  • common letter combinations: ''yö'', ''ei'', ''äi''

  • unusually high degree of letter duplication, both vowels and consonants



Estonian

  • distinct letters: ''ä'', ''ö'', ''õ'' and ''ü''; but never ''ß'' or ''å''

  • ''f'', ''z'', ''š'' and ''ž'' appear in Loanwords and Proper Names only; the last two are substituted with ''sh'' or ''zh'' in some texts

  • ''c'', ''q'', ''w'', ''x'', ''y'' appear in (typically foreign) proper names only

  • similar to Finnish, except:

  • --- letter ''õ'' is unique to Estonian

  • --- words end in consonants more frequently than in Finnish

  • common words: ''ja'', ''on'', ''ei'', ''ta'', ''see''



Hungarian

  • letters Ő, Ű, ő and ű unique to Hungarian

  • letter combinations: ''sz, gy, cs, leg‐, ‐obb''

  • common words: ''a, az, ez, egy, és, van''



Southern Athabaskan Languages


  • vowels with acute accent, Ogonek (nasal hook), or both: á, ą, ą́

  • doubled vowels: aa, áá, ąą, ą́ą́

  • slashed ''l'': ł

  • ''n'' with acute accent: ń

  • quotation mark: ' or ’

  • sequences: dl, tł, tł’, dz, ts’, ií, áa, aá

  • may have rather long words



Western Apache

In addition to the above,
  • may use: u or ú

  • may use vowels with macron: ā ą̄

  • does not use ''ų''



Navajo

In addition to the above,
  • does not use ''u'', ''ú'', or ''ų''



Chiricahua or Mescalero

In addition to the above,
  • uses: u, ú, ų

  • does not use ''o'', ''ó'', or ''ǫ''



Japanese in Romaji

  • words: "desu", "masu", "aru", "suru", esp. at end of sentences;

  • letters: nearly 50% vowels (''a e i o u'');

  • letters: no consonants, except "n" and "h", at end of words

  • a macron or circumflex may be used to indicate doubled vowels, eg. Tōkyō



Vietnamese

  • Roman characters with many diacritical marks on vowels. See Above .

  • Almost all written words are quite short (one syllable).

  • Words beginning with "ng"

  • common words: "cái", "không", "có", "ở"



VIQR

  • The following characters (often in combination) after vowels: ^ ( + ' ` ? ~ .

  • DD, Dd, or dd

  • The following character before punctuation: \



VNI

  • The digits 1-8 after vowels

  • The digit 9 after a D or d

  • The following character before numbers: \



Telex

  • The following characters after vowels: s f r x j

  • The following vowels, doubled up: a e o

  • The letter "w" after the following characters: a o u

  • DD, Dd, or dd



Chinese, Romanized


Standard Mandarin

  • In general, Mandarin syllables end only in n, ng, r; never in p, t, k, m

  • = Pinyin

  • Words beginning with x, q, zh

  • Tone marks on vowels, such as ā, á, ǎ, à

  • --- For convenience while using a computer, these are sometimes substituted with numbers, e.g. a1, a2, a3, a4

  • = Wade-Giles

  • Words do not begin with b, d, g

  • Words beginning with hs

  • Many hyphenated words

  • Apostrophes, e.g. t`a, ch`i (Note: These apostrophes are often omitted)

  • = Gwoyeu Romatzyh

  • Many unusual vowel combinations such as ae, eei, ii, iee, oou, yy, etc.

  • Insertion of r, e.g. arn, erng, etc.

  • Words ending in nn, nq

  • Standard Cantonese

  • In general, Cantonese syllables can end in p, t, k, m, n, ng; never r

  • Minnan in Pe̍h-oē-jī

  • Many hyphenated words.

  • Words can end in p, t, k, m, n, ng, h; never r

  • Roman characters with many diacritical marks on vowels. Unlike Vietnamese, each character has at most one such mark.