Python 2.4.1 (#1, Aug 31 2005, 06:49:06) [GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2 Type "copyright", "credits" or "license()" for more information. **************************************************************** Personal firewall software may warn about the connection IDLE makes to its subprocess using this computer's internal loopback interface. This connection is not visible on any external interface and no data is sent to or received from the Internet. **************************************************************** IDLE 1.1.1 >>> ================================ RESTART ================================ >>> fsock_train=open('data/really_tiny_train.tag','r',0) >>> line = fsock_train.readline() >>> line 'FACTSHEET_NN1 WHAT_DTQ IS_VBZ AIDS_NN1 ?_? \n' >>> words = line.split() >>> words ['FACTSHEET_NN1', 'WHAT_DTQ', 'IS_VBZ', 'AIDS_NN1', '?_?'] >>> word_tag_pairs = [] >>> for word in words: pair=word.split('_') word_tag_pairs[0:0] = [pair] >>> word_tag_pairs [['?', '?'], ['AIDS', 'NN1'], ['IS', 'VBZ'], ['WHAT', 'DTQ'], ['FACTSHEET', 'NN1']] >>> tag_count={} >>> word_tag_matrix={} >>> for elem in words: ... wt_pair = elem.split('_') ... if len(wt_pair) ==2: ... (word,tag) = wt_pair ... word_tag_matrix[word,tag]=word_tag_matrix.get((word,tag),0)+1 ... tag_count[tag]=tag_count.get(tag,0)+1 ... >>>