Saturday, August 28, 2010

Zipf's Law

In formal terms: The frequency of a word is inversely proportional to its rank in the frequency table.

It seems obvious, I admit, but it's actually a nifty mathematical law. It means that the second-most-common word will appear half as often as the most common. Thus, you will see the word 'interpretation' once for every 5000 times you see 'the'. (Except, of course, in a literature class, in which case all bets are off).

That's in theory. In practice, the actual ratio for those two works out to 4248, which is pretty good. 2 and 10000 ('of' vs. 'calves') work out to 8653:1. It's not perfect, but a useful guesstimate.

Handy wiki article.

Wiktionary (one of Wikipedia's sister projects) has a handy gathering of word frequency lists here. The list I used is here.

No comments: