Linguistics 581

Earley Parsing and Top down parsing


Homework assignment

There are 3 problems below, labeled A, B, and C. Parts A and B are about Earley parsing. Part C is about top down parsing. You need to turn in pencil and paper answers to all 3 problems. There is an extra credit problem labeled D. The problem is really short, and a good answer can be pretty short, but it will take some real thought.

Earley Parsing

A. Explain what gave rise to S25, S26, and S27 for the parser chart in Figure 13.14. Note that the answer isn't just: The predicter put S25, S26, and S27 there. Why does the predictor put S25, S26, and S27 in the chart? What happens at previously created edges that leads to the creation of S25, S26, and S27? Don't be scared to look at the algorithm in Figure 13.13 to get your answer.

B. For this next part, use the following grammar

S -> NP VP
S -> Aux NP VP
NP -> Det Nom
Nom -> Noun Nom | Noun
NP -> ProperNoun
NP -> NP PP
VP -> Verb
VP -> Verb NP
PP -> Prep NP
Prep -> in | on | at | from | to
Det -> this | that | a
Noun -> book | flight | meal | money
Verb -> book | include | prefer | landed
ProperNoun -> Houston | TWA | Denver

Be an Earley parser. Parse the parse the NP part of:

  • A flight from Denver to Houston landed.
This means you only need to complete the chart up to chart[6] (all the edges ending at index 6, the one right after the word Houston).

Show the Earley parsing chart in the same format as is used in Figure 13.14. Assign names like S1, S2, ... to all the edges and show them. Organize the edges, as in Figure 13.13, according to what index they end at. You should try to generate the edges in the actual order that the Earley algorithm in Figure 13.13 creates them, but I won't grade that. I will deduct for missing or incorrect edges. (Incorrect means the algorithm wouldn't actually propose such an edge; not that edge does not get used in the final parse).

Top down parsing

C. Use the NLTK recursive descent parser

from nltk.parse import rd
rd.demo()
The command rd.demo() should give you a trace print out of an ambiguous sentence. Here is some code used by the demo function to do what it does, to show how to use the rd parser.

    from nltk import parse, parse_cfg

    grammar = parse_cfg("""
    S -> NP VP
    NP -> Det N | NP PP
    VP -> V NP | V NP PP
    PP -> P NP
    NP -> 'I'
    N -> 'man' | 'park' | 'telescope' | 'dog'
    Det -> 'the' | 'a'
    P -> 'in' | 'with'
    V -> 'saw'
    """)
    
    for prod in grammar.productions():
        print prod
    
    sent = 'I saw a man in the park'.split()
    parser = RecursiveDescentParser(grammar, trace=2)
    parses = parser.nbest_parse(sent)
    for p in parses:
        print p
As you can see, it begins by defining a grammar. Start up your own grammar file mygrammar.py and using the parse_cfg function illustrated in the definition of demo(), define your own grammar and set it to the variable new_grammar. Please don't forget to wrap your terminals in single quotes as in the example above, or your grammar won't parse anything. You can define any grammar you want to experiment with, but you should begin by defining the grammar in part B of this assignment.

You should now try to parse a sentence with this grammar and the rd parser, using what you see in the definition of demo shown above to guide you in how to call the rd parser.

Try to parse the sentence

    That flight landed

Don't put in a period and don't use any upper case letters. Something goes wrong. Keep in mind that the parser works fine with the demo grammar which is shown above.

You should turn in a description of what happens? Hit Control-C to stop what seems to be an infinite loop. Look carefully at your trace output. What is going on? What rule in the grammar is causing the problem. Can you explain why? Can you explain why the demo grammar does not cause ay problems.

Hints and help

  1. NLTK can be of some help in examining grammars.

    Here are some code snippets for using nltk grammars:

        from nltk import nonterminals, Production, parse_cfg
    
        # Create some nonterminals
        S, NP, VP, PP = nonterminals('S, NP, VP, PP')
        N, V, P, Det = nonterminals('N, V, P, Det')
        VP_slash_NP = VP/NP
    
        print 'Some nonterminals:', [S, NP, VP, PP, N, V, P, Det, VP/NP]
        print '    S.symbol() =>', `S.symbol()`
        print
    
        print Production(S, [NP, VP])
    
        # Now do some stuff with new_grammar
        print 'A Grammar:', `new_grammar`
        print '    grammar.start()       =>', `grammar.start()`
        print '    grammar.productions() =>',
        # Use string.replace(...) is to line-wrap the output.
        print `grammar.productions()`.replace(',', ',\n'+' '*25)
        print
        
        print 'Coverage of input words by a grammar:'
        print grammar.covers(['a','dog'])
        print grammar.covers(['a','toy'])
    

    Once a grammar is defined by parsing a string containing all its productions with parse_cfg, as above, you can access parts of the grammar as follows:

    >>> for p in new_grammar.productions():
              print p
    
    S -⟩ NP VP
    S -⟩ Aux NP VP
    NP -⟩ Det Nom
    Nom -⟩ Noun Nom | Noun
    NP -⟩ ProperNoun
    NP -⟩ NP PP
    VP -⟩ Verb
    VP -⟩ Verb NP
    PP -⟩ Prep NP
    Prep -⟩ in | on | at | from | to
    Det -⟩ this | that | a
    Noun -⟩ book | flight | meal | money
    Verb -⟩ book | include | prefer | landed
    ProperNoun -⟩ Houston | TWA | Denver
    
    >>> grammar.productions()[0]
    S -⟩ NP VP
    >>> grammar.productions()[0].lhs()
    S
    >>> grammar.productions()[0].rhs()
    (NP, VP)
    >>> grammar.productions()[0].rhs()[0]
    NP
    >>> NonTs = set([p.lhs() for p in grammar.productions()])
    [S, PP, NP, VP, Det, N, V, P]
    
The last line generated the set of nonterminals in the grammar.

Extra credit

D. Explain why the Earley parser does not have the same problem as the rd parser with the grammar used in Exercise B. The answer will require looking carefully at the algorithm in Figure 13.13. What procedure saves us from the infinite loop? Be as specific as possible. Explain how it save us, by explaining why the procedure is called, and how it avoids the problem you observed in Exercise C.