% python

Styarting NLTK and importing a corpus.

>>> import nltk
>>> from nltk.corpus import brown
>>> wds = brown.words()
>>> wds[:10]
['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', 'Friday', 'an', 'investigation', 'of']

To get word counts:


>>> from collections import Counter
>>> ctr = Counter(wds)
>>> ctr.most_common(10)
[('the', 62713), (',', 58334), ('.', 49346), ('of', 36080), ('and', 27915), ('to', 25732), ('a', 21881), ('in', 19536), ('that', 10237), ('is', 10011)]

To get a particular corpus (including the Brown Corpus) off the web.

>>> nltk.download()
showing info http://nltk.github.com/nltk_data/