% python Styarting NLTK and importing a corpus. >>> import nltk >>> from nltk.corpus import brown >>> wds = brown.words() >>> wds[:10] ['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', 'Friday', 'an', 'investigation', 'of'] To get word counts: >>> from collections import Counter >>> ctr = Counter(wds) >>> ctr.most_common(10) [('the', 62713), (',', 58334), ('.', 49346), ('of', 36080), ('and', 27915), ('to', 25732), ('a', 21881), ('in', 19536), ('that', 10237), ('is', 10011)] To get a particular corpus (including the Brown Corpus) off the web. >>> nltk.download() showing info http://nltk.github.com/nltk_data/