Exercise Chapter 5. 5.1., 5.2, 5.3
Next week you will do assignment 5.6.
Preliminary help with 5.6. This tells you a good data structure to use to store word tag counts, which you need for exercise 5.6
marks-macbook-pro:nltk gawron$ python Enthought Canopy Python 2.7.3 | 64-bit | (default, Aug 8 2013, 05:37:06) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import nltk >>> from nltk.corpus import brown >>> TW = brown.tagged_words() >>> TW[:10] [('The', 'AT'), ('Fulton', 'NP-TL'), ('County', 'NN-TL'), ('Grand', 'JJ-TL'), ('Jury', 'NN-TL'), ('said', 'VBD'), ('Friday', 'NR'), ('an', 'AT'), ('investigation', 'NN'), ('of', 'IN')] >>> from collections import Counter >>> from collections import defaultdict >>> word_tag_dict = defaultdict(Counter) >>> word_tag_dict['The'.lower()]['AT'] += 1 >>> word_tag_dict['FULTON'.lower()]['NP'] += 1 >>> word_tag_dict['County'.lower()]['NN'] += 1 >>> word_tag_dict['Grand'.lower()]['JJ'] += 1 >>> word_tag_dict['Jury'.lower()]['NN'] += 1 >>> 'NN-TL'.split('-') ['NN', 'TL'] >>> 'AT'.split('-') ['AT'] >>> TW[0] ('The','AT') >>> word_tag_dict['fulton']['NP'] 1Here is the definition of freq_bigrams we used today in class.
def freq_bigrams (bigram_list): freqdist = nltk.FreqDist() for (w1,w2) in bigram_list: freqdist.inc((w1.lower(),w2.lower())) return freqdist