Linguistics 581

CKY and Earley Parsing


Homework assignment

CKY and Earley Parsing

A. Explain what gave rise to S25, S26, and S27 for the parser chart in Figure 13.14. note that the answer isn't just: The predicter put S25, S26, and S27 there. Why does the predictor put S25, S26, and S27 in the chart? What happens at previously created edges that leads to the creation of S25, S26, and S27? Don't be scared to look at the algorithm in Figure 13.13 to get your answer.

B. For this next part, use the following grammar

S -> NP VP
S -> Aux NP VP
NP -> Det Nom
Nom -> N Nom | N
NP -> ProperNoun
NP -> NP PP
VP -> Verb
VP -> Verb NP
PP => Prep NP
Prep => in | on | at
Det -> this | that | a
Noun -> book | flight | meal | money
Verb => book | include | prefer | landed
ProperNoun => Houston | TWA | Denver
for each category.

Be an Earley parser. Parse the parse the NP part of:

This means you only need to complete the chart up to chart[6] (all the edges ending at index 6, the one right after the word Houston).

Show the Earley parsing chart in the same format as is used in Figure 13.14. Assign names like S1, S2, ... to all the edges and show them. Organize the edges, as in Figure 13.13, according to what index they end at. You should try to generate the edges in the actual order that the Earley algorithm in Figure 13.13 creates them, but I wont mark that. I will deduct for missing or incorrect edges. (Incorrect means the algorithm wouldnt actually propose sch an edge; not that edge does not get used in the final parse).

C. Now be an Earley parser and

D. End of chapter 13. Exercises 13.1, 13.2. You do not have to implement a parse, just a recognizer.

Hints and help on creating a CKY recognizer:

  1. Your procedure cky_parse should take a list of strings of strings as an argument and return a Boolean. For example,
       >>> cky_parse(['a','dog','chased','a','cat'])
      
    should return true for the grammar given below.
  2. You should implement the table called table in the pseudocode in Figure 13.10 as a list of lists of sets:
      >>> NP in table[3][5]
      
    should return 'True' after parsing if there is an 'NP' extending from 3 to 5. For example, it should return 'True' after parsing
    ['a','dog','chased','a','cat']
    
    with respect to the grammar below, because 'a cat' is covered as an np by that grammar.

    Python sets are created by passing a list with the desired elements to the set constructor function:

     >>> set([NP, VP, S])
      set([NP,VP,S])
    
    Consider the cell [0,5] in Figure 13.9. This would be accessed as follows after parsing:
     >>> table[0][5]
      set([S,VP])
      
  3. NLTK can be of some help in implementing recognizers but using nltk is entirely optional. I suggest using NLTK to provide the data structures for grammars.

    Here are some code snippets for using nltk grammars:

    def cfg_demo():
        """
        A demonstration showing how C{ContextFreeGrammar}s can be created and used.
        """
    
        from nltk import nonterminals, Production, parse_cfg
    
        # Create some nonterminals
        S, NP, VP, PP = nonterminals('S, NP, VP, PP')
        N, V, P, Det = nonterminals('N, V, P, Det')
        VP_slash_NP = VP/NP
    
        print 'Some nonterminals:', [S, NP, VP, PP, N, V, P, Det, VP/NP]
        print '    S.symbol() =>', `S.symbol()`
        print
    
        print Production(S, [NP])
    
        # Create some Grammar Productions
        grammar = parse_cfg("""
          S -> NP VP
          PP -> P NP
          NP -> Det N | NP PP
          VP -> V NP | VP PP
          Det -> 'a' | 'the'
          N -> 'dog' | 'cat'
          V -> 'chased' | 'sat'
          P -> 'on' | 'in'
        """)
    
        print 'A Grammar:', `grammar`
        print '    grammar.start()       =>', `grammar.start()`
        print '    grammar.productions() =>',
        # Use string.replace(...) is to line-wrap the output.
        print `grammar.productions()`.replace(',', ',\n'+' '*25)
        print
        
        print 'Coverage of input words by a grammar:'
        print grammar.covers(['a','dog'])
        print grammar.covers(['a','toy'])
    
    Once a grammar is defined by parsing a string containing all its productions with parse_cfg, as above, you can access parts of the gramar as follows:
    >>> for p in grammar.productions():
              print p
    S -⟩ NP VP
    PP -⟩ P NP
    NP -⟩ Det N
    NP -⟩ NP PP
    VP -⟩ V NP
    VP -⟩ VP PP
    Det -⟩ 'a'
    Det -⟩ 'the'
    N -⟩ 'dog'
    N -⟩ 'cat'
    V -⟩ 'chased'
    V -⟩ 'sat'
    P -⟩ 'on'
    P -⟩ 'in'
    >>> grammar.productions()[0]
    S -⟩ NP VP
    >>> grammar.productions()[0].lhs()
    S
    >>> grammar.productions()[0].rhs()
    (NP, VP)
    >>> grammar.productions()[0].rhs()[0]
    NP
    >>> NonTs = set([p.lhs() for p in grammar.productions()])
    [S, PP, NP, VP, Det, N, V, P]
    
    The last line generated the set of nonterminals in the grammar.