Converting an Earley recognizer to an Earley parser

Currently, after installing small_parsers, when you do

>>>import small_parsers.simple_earley_parser
The parser parses 3 sentences and gives you lots of output. You get
Python 2.4.3 (#1, Apr  7 2006, 10:54:33) 
[GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> ## working on region in file /tmp/python-300dz1.py...

   0 initialize_chart(gamma, 0, 0, ['s']) []
chart[0]:
   1 predictor:(s, 0, 0, ['np', 'vp']) []
   2 predictor:(s, 0, 0, ['aux', 'np', 'vp']) []
   3 predictor:(np, 0, 0, ['det', 'nom']) []
   4 predictor:(np, 0, 0, ['nom']) []
   5 predictor:(np, 0, 0, ['np', 'pp']) []
   7 predictor:(nom, 0, 0, ['n']) []
chart[1]:
   6 scanner:(det, 0, 1, ['*completed*']) ['the']
   8 completer:(np, 0, 1, ['nom']) ['the']
   9 predictor:(nom, 1, 1, ['n']) []
chart[2]:
   10 scanner:(n, 1, 2, ['*completed*']) ['dogs']
   11 completer:(nom, 1, 2, ['*completed*']) ['dogs']
 Lots more! 
The last actually has multiple parses after building 145 parse edges in the chart. The output concludes.
Parse found!
There were 1 parses.
This is wrong. There are more.

The parser instance for the last sentence has been set to p2. When you try to print the parses for the last sentence you get.

>>> p2.parses[0].pp()
"(s: 'the')"
This is bizarre and incorrect. The reason for this alarming behavior is that the dtr_dict where all parse tree information SHOULD be stored, has nothing in it.
>>> p2.dtr_dict
{}
In fact, as the documentation in small_parser.py says:
        A parse forest should be stored in the parser's
        dtr_dict as a dictionary in the form:
          p1.dtr_dict[edge] =  List of DtrLists
        where each DtrList is the list of dtr edges in one
        local subtree,  For example, suppose "see a man with a
         telescope" has 2 parses with two local subtrees:

	                 vp:e1         |        vp:e1
	              /       \        |      /      \
                 vp:e2       pp:e 3    |    v:e4    np:e5
	         /     \     /     \   |    /     /       \
                /       \   /       \  |   /     /         \
               see   a  man with a tel.| see   a man with a tel.
												       
  Then for a parser p1 that has just parsed this ambiguous sentence,
  p1.dtr_dict[e1] is a list of 2 dtr lists:
                  p1.dtr_dict[e1] = [[e2,e3],[e4,e5]]
But this has not happened. Your job is to fix this.

This assignment can be done by ADDING one line of code.

It may help to work on a simple example with 2 parses. (Using grammar g1)

 the dogs in the park in the dogs walk

File list for small_parsers module:

  1. small_grammar.py Two small grammars for experiement with.
  2. simple_earley_parser.py: The file you will have to change.
  3. simple_cky_parser.py: For a point of comparison. Fully working. Produces the RIGHT dtr lists. Although chart edges have different attributes in cky, the structure of dtr_dict is the same.

Code hints:

  1. The code for updating the dtr dictionary is in update_dtr_records in both parsers.
  2. The code for adding an edge to the chart is in add_to_chart for both parsers.

What to turn in

  1. A description of what you did
  2. printed parse trees (p1.print_nltk_parse_trees()) for i2 and i3 at the bottom of the file
    1. i2="the agency sees widespread use of the codes as a way of handling the rapidly growing mail volume and controlling labor costs"
    2. i3 = "the agency sees use of the codes as a way"
  3. The revised code for simple_earley_parser.py
  4. Tell me many parses i2 and i3 had (len(p1.parses)) in your description.