Information About

Perplexity




:P(x)=2^{H(x)}=2^{-\sum_{i=1}^np(i)\log_2 p(i)}

where H(x) is the entropy of the probability distribution x, p(i) is the probability of the i-th event in the distribution, and n is the number of possible events in the distribution.

In with the same entropy as the actual model has. The lower perplexity a language model has, the easier it is to predict the next word given the previous words and the model. Domain-specific texts usually have lower perplexity (= less variation) than general language. The lowest perplexity that has been published on the Brown Corpus (1 million words of American English ) is about 247 which corresponds to an entropy of 1.75 bits.