You can recognize text in a foreign language by looking up characters specific to that language. For some reason this is often more accurate than Language Recognition software, which pays little attention to the characters.
- ABCDEFGHIJKLMNOPQRSTUVWXYZ ( Latin Alphabet )
- ---and no other - English Language , Zulu Language , Japanese Language in Romaji (see below), Indonesian Language , Hawaiian Language , Swahili Language , Afrikaans Language
- ---ÆØÅæøå - Danish Language , Norwegian Language
- ---ÅÄÖåäö - Swedish Language
- ---ÁÐÉÍÓÚÝÞÆÖáðéíóúýþæö - Icelandic Language
- ---ÄÖäö - Finnish Language (occasionally ŠšŽž in loanwords as well as Åå in names)
- ---ÄÖÕÜäöõü - Estonian Language
- ---àéëï - Dutch Language
- ---ĉĈĝĜĥĤĵĴŝŜŭŬ - Esperanto
- ---àâçéèêîïôœùû - French Language
- ---ÄÖÜäöüß - German Language
- ---àéèìòù - Italian Language
- ---ÁÉÍÓÚÂÊÔÀãõçáéíóúâêôà (ü Brazilian and k, w and y not in native words) - Portuguese Language
- ---áéíñÑóúü ¡¿ - Spanish Language
- ---ÀÇÉÈÍÓÒÚÜÏàçéèíóòúüï· - Catalan Language
- ---ÂÊÎÔÛŴŶâêîôûŵŷáéíï - Welsh Language
- ---ÁÉÍÓÖŐÚÜŰáéíóöőúüű - Hungarian Language
- ---ĂÎÂŞŢăîâşţ - Romanian Language
- ---çÇğĞıİöÖşŞüÜ - Turkish Language
- ---ÇçÊêÎÛû - Kurdish Language
- --- ÁĄĄ́ÉĘĘ́ÍĮĮ́ŁŃ áąą́éęę́íįį́łń (FQRVfqrv not in native words)
-- ’ÓǪǪ́ āą̄ēę̄īį̄óōǫǫ́ǭúū - Western Apache Language
-- 'ÓǪǪ́ óǫǫ́ - Navajo Language
-- ’ÚŲŲ́ úųų́ - Chiricahua Language / Mescalero Language
- ---ąćęłńóśźż Polish Language
- ---ČŠŽ
--and no other - Slovenian Language
--ĆĐ - Bosnian Language , Croatian Language
--ÁĎÉĚŇÓŘŤÚŮÝáďéěňóřťúůý - Czech Language
--ÁÄĎÉÍĽĹŇÓÔŔŤÚÝáäďéíľĺňóôŕťúý - Slovak Language
--ĀĒĢĪĶĻŅŌŖŪāēģīķļņōŗū - Latvian Language
--ĄĘĖĮŲŪąęėįųū - Lithuanian Language
- ---ả ạ ấ ầ ẩ ẫ ậ ắ ằ ẳ ẵ ặ đ ₫ ẻ ẹ ế ề ể ễ ệ ỉ ĩ ị ỏ ọ ổ ỗ ộ ơ ớ ờ ở ỡ ợ ủ ụ ư ứ ừ ử ữ ự ỷ ỹ ỵ – can ''only'' be Vietnamese
- ---é - Sundanese Language
- الصفحة الرئيسية - Arabic Alphabet
- --- Arabic , Persian , Malay ( Jawi ), Kurdish (Soranî), Panjabi , Pashto , Sindhi , Urdu , others.
- Brahmic Family of scripts
- --- Bengali Script
--অ আ কা কি কী উ কু ঊ কূ ঋ কৃ এ কে ঐ কৈ ও কো ঔ কৌ ক্ কত্ কং কঃ কঁ ক খ গ ঘ ঙ চ ছ জ ঝ ঞ ট ঠ ড ঢ ণ ত থ দ ধ ন প ফ ব ভ ম য র ৰ ল ৱ শ ষ স হ য় ড় ঢ় ০ ১ ২ ৩ ৪ ৫ ৬ ৭ ৮ ৯
- --- Devanāgarī
--अ प आ पा इ पि ई पी उ पु ऊ पू ऋ पृ ॠ पॄ ऌ पॢ ॡ पॣ ऍ पॅ ऎ पॆ ए पे ऐ पै ऑ पॉ ऒ पॊ ओ पो औ पौ क ख ग घ ङ च छ ज झ ञ ट ठ ड ढ ण त थ द ध न प फ ब भ म य र ल ळ व श ष स ह ० १ २ ३ ४ ५ ६ ७ ८ ९ प् पँ पं पः प़ पऽ
--used to write, either along with other scripts or exclusively, several India n languages including Sanskrit , Hindi , Marathi , Kashmiri , Sindhi , Bihari , Bhili , Konkani , Bhojpuri and Nepali from Nepal .
- БДЖИЛПУЦЧШ ( Cyrillic Alphabet )
- ---ЙЩЬЮЯ
--ҐЄІЇ - Ukrainian Language
--Ъ - Bulgarian Language
-ЁЭЫ - Russian Language
Ў, І instead of И - Belarusian Language
- ---ЉЊЏ ( Vuk Karadžić 's reform)
--ЋЂ - Serbian Language
--ЃЌЅ - Macedonian Language
- --- ЅЋѸѲѠЩЪЬҌЮЯѦѪѮѰѴ - Old Church Slavonic
- --- In Transnistria , Romanian is written in Cyrillic characters
- ΓΔΘΛΞΠΣΦΨΩαβγδεζηθικλμνξπρςστυφχψω ( Greek Alphabet ) – Greek Language
- אבגדהוזחטיכלמנסעפצקרשת ( Hebrew Alphabet )
- --- and maybe some odd dots and lines above, below, or inside characters - Hebrew Language
- --- פֿ; dots/lines below letters appearing ''only'' with א,י, and ו - Yiddish
- --- Ladino
- 日本語勉強 - East Asian Languages
- ---and no other - Chinese Language
- ---with あいうえお Hiragana and/or アイウエオ Katakana - Japanese Language
- ---with characters like 위키백과에 - Korean Language
- ---Vietnamese uses Latin alphabet – see above
- ㄅㄆㄇㄈㄉㄊㄋㄌㄍㄎㄏ etc. -- ㄓㄨㄧㄋㄈㄨㄏㄠ ( Zhuyin )
- --- ㄪㄫㄬ -- not Mandarin
- กขคฅฆงจฉชซฌญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรฤฤๅลฦฦๅวศษสหฬอฮ ( Thai Alphabet ) - Thai Language
- Ա Բ Գ Դ Ե Զ Է Ը Թ Ժ Ի Լ Խ Ծ Կ Հ Ձ Ղ Ճ Մ Յ Ն Շ Ո Չ Պ Ջ Ռ Ս Վ Տ Ր Ց Ւ Փ Ք Օ Ֆ ( Armenian Alphabet ) - Armenian Language
- ა ბ გდ ევ ზ ჱ თ ი კ ლ მ ნ ჲ ო პ ჟ რ ს ტ ჳ უ ფ ქ ღ ყ შ ჩ ც ძ წ ჭ ხ ჴ ჯ ჰ ჵ ჶ ჷ ჸ ( Georgian Alphabet ) - Georgian Language
Lots of Latin roots.
- Common words: ''de'', ''la'', ''le'', ''du'', ''des'', ''il'', ''et'';
- Words ending in ''-x'', especially ''-aux'' or ''-eux'';
- Many apostrophised contractions, i.e. words beginning with ''l''' or ''d'''
- Accented letters: ''à â ç è é ê î ô û'', rarely ''ë ï'', but never ''á í ì ó ò ú'', and ''ù'' only in the word ''où''
- Common words: ''lé'', ''dé'', ''tchi'', ''ès'', ''i''', ''ch'''
- "Tch", "dg", "th" and "în" are common character combinations. "ou" is frequently followed by another vowel.
- Many apostrophised short forms, e.g. words beginning with ''l''', ''d''' or ''r'''. ''é'' frequently alternates with an apostrophe e.g. ''c'mîn''/''quémîn''.
- Characters: ¿ ¡ (inverted question and exclamation marks), ñ
- Acute accents are used on all vowels; no other accent marks are employed (á, é, í, ó, ú)
- Some words frequently used: de, el, los, la(s), uno(s), una(s), y
- Spanish DOESN'T use apostrophised contractions
- Word endings: -o, -a, -ción, -miento, -dad
- Angle quotation marks: « » (though "curly-Q" quotation marks are also used); dialogue often indicated by means of dashes
- Almost every word ends in a vowel. Exceptions include ''non'', ''il'', ''per'', ''con''.
- Common one-letter word: ''è''
- Common word: ''perché''
- Letter sequences: gli, gn, sci
- Word endings: -o, -a, -zione, -mento, -tà, -aggio
- Grave accent (e.g., on à) almost always occurs in the last letter of words.
- Geminate consonants (''tt'', ''zz'', ''cc'', ''ss'', ''bb'', ''pp'', ''ll'', etc) are frequent.
- Character combination "l·l"
- Word endings: -o, -a, -es, ció, -tat
- Word beginning: ll-
- Characters: ă â î ş ţ
- Common words: şi, de, la, a, ai, ale, alor, cu
- Word endings: -a, -ă, -u, -ul, -ţie (or -ţiune), -ment, -tate
- Note that Romanian is sometimes written online with no diacritics, making it harder to identify
- Common one-letter words: a, à, e, é, o
- Common two-letter words: ao, as, às, da, de, do, em, os, ou, um
- Common three-letter words; aos, das, dos, ele, ela, não, por, que, uma, uns
- Common endings: -ção, -ções, -dade
- Common digraphs: nh, lh
- Most singular words end in vowels. Other singular words end in l, m, r, z
- Plural words end in -s
- European Portuguese often uses c before ç and t: acção, acto, etc.
- Characters: å, é, è, ê, î, ô, û
- Common digraphs and trigraphs: ai, ae, én, -jh-, tch, oe, -nn-, -nnm-, xh, ou
- Common one-letter words: a, å, e, i, t', l', s', k'
- Common two-letter words: al, ås, li, el, vs, ki, si, pô, pa, po, ni, èn, dj'
- Common three-letter words: dji, nos, vos, les, ses, nén, rén, bén, pol, tel, mel
- Common endings: -aedje, -mint, -xhmint, -ès, -ea, -ou, -owe, -yî, -åcion
- Apostrophes are followed by a space (preferably non breaking one), eg: ''l' ome'' instead of ''l'ome''.
- words: ''an'', ''in'', ''on'', ''the'', ''that'', ''is'', ''are'', ''I'' (''should'' always be a capital)
- letter sequences: ''th'', ''ch'', ''sh'', ''ough'', ''augh''
- word endings: ''-ing'', ''-tion'', ''-ed'', ''-age'', ''-s'', ''-’s'', ''-’ve'', ''-n’t'', ''-’d''
- letter sequences ''ij'', ''ei'', doubled vowels, ''kw'', ''sch'',
- words: het, op, en, een, voor (and compounds of voor).
- word endings: ''-tje'', ''-sje'', ''-ing'', ''-en'', ''-lijk'',
- at the start of words: z, v, ''ge-''
- “t/m” common in between two dates, times or numbers (e.g. house numbers), for example “9 t/m 5”
- umlauts (ä, ö, ü), eszet (ß)
- letter sequences: ''sch'', ''tsch'', ''tz'', ''ss'',
- common words: ''der'', ''die'', ''das'', ''er'', ''sie'', ''es'', ''ist'', ''und'', ''oder'', ''aber''
- common endings: ''-en'', ''-er'', ''-ern'', ''-st'', ''-ung'', ''-chen''
- rare letters: ''y'' (except in loan words)
- long compound words
- many capitalised words in the middles of sentences
- common words: ''och'', ''i'', ''att'', ''det'', ''en'', ''som'', ''det'', ''är'', ''av'', ''den'', ''på''
- long compound words
- letter sequences: ''stj'', ''sj'', ''skj'', ''tj''
- uses Diacritics : ā, č, ē, ģ, ī, ķ, ļ, ņ, ō, ŗ, š, ū, ž
- does not have letters: Q, W, X, Y
- extremely rare doubling of Vowels
- rare doubling of Consonants
- a period (.) after ordinal numbers, e.g. 2005. gads
- common words: "ir", "bija", "tika", "es", "viņš"
- unusual consonant clusters "rz", "sz" , "cz", "prz", "trz";
- uses : ą , ę , ć , ś , ł , ó , ż , ź
- words "i", "w";
- word "się".
- visual abundance of letters "ž,š,ů,ě,ř";
- words "je", "v";
- to distinguish from Slovak: does not use ä, ľ, ĺ, ŕ or ô.
- visual abundance of letters "ž, š, č";
- uses : ä, ľ, ĺ, ŕ and ô;
- typical suffixes: ''-cia'', ''-ť'',
- to distinguish from Czech: does not use ě, ř or ů;
- letters ''Ŵ, ŵ'' unique to Welsh
- words ''y, yr, yn, a, ac, i, o''
- letter sequences ''wy, ch, dd, ff, ll, mh, ngh, nh, ph, rh, th, si''
- letters not used: ''k, q, v, x, z''
- letter only used rarely, in loanwords: ''j''
- commonly accented letters: ''â, ê, î, ô, û, ŵ, ŷ''
- word endings: ''-ion, -au, -wr, -wyr''
- ''y'' is the most common letter in the language
- circumflex accent (''^'') is by far the commonest diacritical mark.
- The word "xwe" (oneself, myself, yourself etc.) is highly specific (xw combination) and frequent.
- kir
- distinct letters ''ä'' and ''ö''; but never ''õ'' or ''ü''
- common words: ''sinä'', ''on''
- common endings: ''-nen'', ''-ka''/''-kä'', ''-in''
- common letter combinations: ''yö'', ''ei'', ''äi''
- unusually high degree of letter duplication, both vowels and consonants
- distinct letters: ''ä'', ''ö'', ''õ'' and ''ü''; but never ''ß'' or ''å''
- ''f'', ''z'', ''š'' and ''ž'' appear in Loanwords and Proper Names only; the last two are substituted with ''sh'' or ''zh'' in some texts
- ''c'', ''q'', ''w'', ''x'', ''y'' appear in (typically foreign) proper names only
- similar to Finnish, except:
- --- letter ''õ'' is unique to Estonian
- --- words end in consonants more frequently than in Finnish
- common words: ''ja'', ''on'', ''ei'', ''ta'', ''see''
- letters Ő, Ű, ő and ű unique to Hungarian
- letter combinations: ''sz, gy, cs, leg‐, ‐obb''
- common words: ''a, az, ez, egy, és, van''
- vowels with acute accent, Ogonek (nasal hook), or both: á, ą, ą́
- doubled vowels: aa, áá, ąą, ą́ą́
- slashed ''l'': ł
- ''n'' with acute accent: ń
- quotation mark: ' or ’
- sequences: dl, tł, tł’, dz, ts’, ií, áa, aá
- may have rather long words
In addition to the above,
- may use: u or ú
- may use vowels with macron: ā ą̄
- does not use ''ų''
In addition to the above,
- does not use ''u'', ''ú'', or ''ų''
In addition to the above,
- uses: u, ú, ų
- does not use ''o'', ''ó'', or ''ǫ''
- words: "desu", "masu", "aru", "suru", esp. at end of sentences;
- letters: nearly 50% vowels (''a e i o u'');
- letters: no consonants, except "n" and "h", at end of words
- a macron or circumflex may be used to indicate doubled vowels, eg. Tōkyō
- Roman characters with many diacritical marks on vowels. See Above .
- Almost all written words are quite short (one syllable).
- Words beginning with "ng"
- common words: "cái", "không", "có", "ở"
- The following characters (often in combination) after vowels: ^ ( + ' ` ? ~ .
- DD, Dd, or dd
- The following character before punctuation: \
- The digits 1-8 after vowels
- The digit 9 after a D or d
- The following character before numbers: \
- The following characters after vowels: s f r x j
- The following vowels, doubled up: a e o
- The letter "w" after the following characters: a o u
- DD, Dd, or dd
- In general, Mandarin syllables end only in n, ng, r; never in p, t, k, m
- Words beginning with x, q, zh
- Tone marks on vowels, such as ā, á, ǎ, à
- --- For convenience while using a computer, these are sometimes substituted with numbers, e.g. a1, a2, a3, a4
- Words do not begin with b, d, g
- Words beginning with hs
- Many hyphenated words
- Apostrophes, e.g. t`a, ch`i (Note: These apostrophes are often omitted)
- Many unusual vowel combinations such as ae, eei, ii, iee, oou, yy, etc.
- Insertion of r, e.g. arn, erng, etc.
- Words ending in nn, nq
- In general, Cantonese syllables can end in p, t, k, m, n, ng; never r
- Many hyphenated words.
- Words can end in p, t, k, m, n, ng, h; never r
- Roman characters with many diacritical marks on vowels. Unlike Vietnamese, each character has at most one such mark.
|