Information About

Dices Coefficient




For sets X and Y of keywords used in Information Retrieval , the coefficient may be defined as:C. J. van Rijsbergen (1979)




where n_{t} is the number of character bigrams found in both strings, n_{x} is the number of bigrams in string ''x'' and n_{y} is the number of bigrams in string ''y''. For example, to calculate the similarity between:

:night
:nacht

We would find the set of bigrams in each word:
:{ni,ig,gh,ht}
:{na,ac,ch,ht}

Each set has 4 elements, and the intersection of these two sets has only one element: ht.



SEE ALSO




NOTES



REFERENCES