Persian Today Corpus Article Index for
Persian
Shopping
Corpus
Website Links For
Persian
 

Information About

Persian Today Corpus




The Persian Today Corpus or '''The Persian One-Million-word Corpus''' () is a book written in Persian by Hamid Hassani , published in Iran , Tehran , 2005 . The book is based on a 1,000,000-word Corpus that contains 80 ‘‘main texts’’ (over 500 subtexts) of modern Persian , mostly written in the years 1994 - 2004 . By ‘‘main texts’’ the writer means those Publications which are referred to as ‘‘books’’, ‘‘magazines’’, and ‘‘newspapers’’ as well as ‘‘subtexts’’ chapters or short and long articles and essays that books, magazines, and newspapers are composed of. There is no doubt that the usefulness of a Corpus is primarily judged by its volume and the variety of its sources.
The Persian Today Corpus is a Corpus not a Concordance Dictionary. In a corpus, the words appear exactly as used in the source texts.

The first important advantage of a corpus is its efficiency in Language description (morphological, lexical, orthographic, and Phonetic features, to name the least). The second advantage is providing accurate Statistics for collecting basic vocabulary and compiling textbooks for Language Teaching .

There are different types of dictionaries, and word indexes. Compiled by specialists in research centers, universities, and academies of several countries, especially developed ones, Lingual corpora have been around since decades ago. The best known corpora of the world, such as the Brown Corpus , usually include around 1,000,000 words, though there are some corpora made up of several hundred million words. Among corpora the most famous ones in the world are those prepared for English ( American and British ), some of which, like the British National Corpus ), consist of over 100,000,000 words.

Sponsored by the Iran Language Institute (ILI), a learner’s dictionary of Persian is being compiled by the other Iranian scholar, Behruz Safarzadeh (in collaboration with Hamid Hassani ), which is due to be published in 2006. This dictionary consists of over 5,000 entries and the basis for choosing some of entries and the Defining Vocabulary is the above-mentioned 1,000,000-word corpus. It is expected that the learner’s dictionary, which is the first corpus-based Persian dictionary, will be welcomed by Persian lovers around the world.

These are some Persian words with their original Orthography , Pronunciation (large letters show Accented Syllable in each Word ), Meaning in English , Frequency , and Usage Percentage according to Hassani’s Corpus :

1. و <''VA/-O''> (a Conjunction that means ''and''): 49,758 times of 1,002,394 (4.96%),

2. به <''BE''> (a Preposition that means ''to'', ''at'', ''in'', or ''with''): 32,478 times (3.24%),

3. را <''RAA''> (a Particle serving as a sign of the {Link without Title} Direct Object ): 25,797 times (2.57%),

4. از <''AZ''> (a Preposition that means ''from'', ''of'', ''since'', ''than'', out of, or belonging to): 23,717 times (2.37%),

5. كه <''KE''> (a Conjunction , a Pronoun , a Relative , or an Interrogative that means ''that'', ''which''; ''who'', ''who?''; or used idiomatically): 22,593 times (2.25%),

6. در <''DAR''> (a Preposition that means ''in'', ''at'', ''on'', or ''within''; a Noun that ''door''): 21,671 times (2.16%),

7. اين <''IIN''> (an Adjective or a Pronoun that means ''this''): 11,762 times (1.17%),

8. با <''BAA''> (a preposition that means ''with'' or ''by''): 11,611 times (1.16%),

9. است/-ست <''AST/-ST''> (a Verb that means ''is''): 9,837 times (0.981%),

10. آن <''AAN''> (an Adjective or a Pronoun that means ''that''; Moment ): 6,999 times (0.698%)...

30. كار <''KAAR''> (a Noun that means ''work''): 2,535 times (0.253%)...

50. بيرون <''biiROON''> (an Adverb that means ''out'' or ''outside''): 1,551 times (0.155%)...

70. هيچ <''HIICH''> (an Adjective , a Noun , or an Adverb that means ''any'', ''nothing'', ''ever'', at all, or ''no''): 1,277 times (0.127%)...

100. بابا <''baaBAA''> (a Noun that means Papa , ''daddy'', ''dad'', or ''father''): 1,005 times (0.1%)...

125. شب <''SHAB''> (a noun or an adverb that means ''night''): 856 times (0.085%)...

137. ايران <''iiRAAN''> (the Proper Noun Iran ): 774 times (0.077%)...

142. كتاب <''keTAAB''> (a noun that means ''book''): 759 times (0.076%)...

150. آنجا/ آنجا <''aan-JAA''> (an adverb or a pronoun that means ''there''): 726 times (0.072%)...

196. شهر <''SHAHR''> (a noun that means ''city'' or ''town''): 594 times (0.059%)...

210. چشم <''CHESHM''> (a noun that means ''eye''): 552 times (0.055%)...

376. امروز <''emROOZ''> (a noun or an Adverb that means ''today''): 319 times (0.032%)...

396. كشور <''keshVAR''> (a noun that means ''country''): 297 times (0.03%)...

476. آمريكا/امريكا <''aamriiKAA/emriiKAA''> (the Proper Noun America ): 258 times (0.026%)...

545. ده <''DAH''> (a Numeral (adjective/noun) that means ''ten''): 233 times (0.023%)...

838. امام <''eMAAM''> (a noun that means Imam ): 157 times (0.016%)...

879. انگليسي <''engeliiSII''> (the proper nouns English or British ): 149 times (0.015%)...

1000. حسابي <''hesaaBII''> (an Adjective that means ''good'' or ''regular''): 133 times (0.013%)...

1150. عسل <''aSAL''> (a noun that means Honey ): 116 times (0.011%)...

1500. دروني <''darooNII''> (an adjective that means Internal ): 87 times (0.009%)...

1857. ده <''DEH''> (a noun that means Village ): 70 times (0.007%)...

2000. ميرساند <''MI-resaanad''> (a verb that means he/she/it reaches/extends/delivers/supplies/carries): 65 times (0.006%)...

2792. جمعه <''jom’E''> (a noun or an Adverb that means Friday): 43 times (0.004%)...

3000. كلاسها <''kelaas-HAA''> (a plural noun (a Noun + Suffix ) that means ''classes''): 40 times (0.004%)...

3445. شاهزاده <''shaah-zaaDE''> (a noun that means ''prince'' or ''princess''): 34 times (0.003%)...

4418. جوراب <''jooRAAB''> (a Noun that means ''socks'' or ''stockings''): 24 times (0.002%)...

5000. بخت <''BAKHT''> (a noun that means Luck or fortune): 20 times (0.002%)...

5552. ميليمتر <''miiliiMETR''> (a noun that means Millimeter ): 18 times (0.002%)...

8000. سووشون <''soovaSHOON''> (the Proper Noun ''Suvashun'', the name of a Persian Novel ) written by Simin Daneshvar : 10 times (0.001%)...