Ling 581: NLTK assignment

Basic assignment

Go to this web page and read the material there, paying special attention to the coding examples. Go to the exercises at the bottom of the page and do problems 4 and 8. Here is what you should hand in in a single email message. You do not have to hand in paper versions of you answers in class.

Zipf's Law

Problem 23. Zipf's Law. Turn in the two loglog graphs the exercise asks you to create. I suggest you use the Brown corpus to create the graph based on English; Brown is about 1.2 M words. Here is how to get the Brown Corpus. In Python, do

import nltk
nltk.download()
This brings up a window you can interact with. There are some tabs at the top. Choose the tabl labeled Corpora and select Brown, and click the download button at the bottom of the window. Also turn in a discussion of what you learned from this exercise. Describe the graph you're seeing from the random vocabulary experiment in words and say a few words explaining it. What kind of words are the most frequent? In light of this, what does Zipf's Law really tell you about the frequency distribution of words?

Pylab background

Pylab background