There are serious issues with the top down recursive descent parser. Load up the following grammar:
productions = (
# Syntactic Productions
cfg.Production(S, [NP, 'saw', NP]),
cfg.Production(S, [NP, VP]),
cfg.Production(NP, [Det, N]),
cfg.Production(NP, [NP, PP]),
cfg.Production(VP, [V, NP, PP]),
cfg.Production(NP, [Det, N, PP]),
cfg.Production(PP, [P, NP]),
# Lexical Productions
cfg.Production(NP, ['I']), cfg.Production(Det, ['the']),
cfg.Production(Det, ['a']), cfg.Production(N, ['man']),
cfg.Production(V, ['saw']), cfg.Production(P, ['in']),
cfg.Production(P, ['with']), cfg.Production(N, ['park']),
cfg.Production(N, ['dog']), cfg.Production(N, ['telescope'])
)
grammar = cfg.Grammar(S, productions)
Result: infinite loop. Here is the trace:
Parsing 'I saw a man in the park'
[ * [S] ]
E [ * [NP] 'saw' [NP] ]
E [ * [Det] [N] 'saw' [NP] ]
E [ * 'the' [N] 'saw' [NP] ]
E [ * 'a' [N] 'saw' [NP] ]
E [ * [NP] [PP] 'saw' [NP] ]
E [ * [Det] [N] [PP] 'saw' [NP] ]
E [ * 'the' [N] [PP] 'saw' [NP] ]
E [ * 'a' [N] [PP] 'saw' [NP] ]
E [ * [NP] [PP] [PP] 'saw' [NP] ]
E [ * [Det] [N] [PP] [PP] 'saw' [NP] ]
E [ * 'the' [N] [PP] [PP] 'saw' [NP] ]
E [ * 'a' [N] [PP] [PP] 'saw' [NP] ]
E [ * [NP] [PP] [PP] [PP] 'saw' [NP] ]
E [ * [Det] [N] [PP] [PP] [PP] 'saw' [NP] ]
E [ * 'the' [N] [PP] [PP] [PP] 'saw' [NP] ]
E [ * 'a' [N] [PP] [PP] [PP] 'saw' [NP] ]
E [ * [NP] [PP] [PP] [PP] [PP] 'saw' [NP] ]
E [ * [Det] [N] [PP] [PP] [PP] [PP] 'saw' [NP] ]
E [ * 'the' [N] [PP] [PP] [PP] [PP] 'saw' [NP] ]
E [ * 'a' [N] [PP] [PP] [PP] [PP] 'saw' [NP] ]
E [ * [NP] [PP] [PP] [PP] [PP] [PP] 'saw' [NP] ]
E [ * [Det] [N] [PP] [PP] [PP] [PP] [PP] 'saw' [NP] ]
E [ * 'the' [N] [PP] [PP] [PP] [PP] [PP] 'saw' [NP] ]
E [ * 'a' [N] [PP] [PP] [PP] [PP] [PP] 'saw' [NP] ]
E [ * [NP] [PP] [PP] [PP] [PP] [PP] [PP] 'saw' [NP] ]
E [ * [Det] [N] [PP] [PP] [PP] [PP] [PP] [PP] 'saw' [NP] ]
E [ * 'the' [N] [PP] [PP] [PP] [PP] [PP] [PP] 'saw' [NP] ]
E [ * 'a' [N] [PP] [PP] [PP] [PP] [PP] [PP] 'saw' [NP] ]
E [ * [NP] [PP] [PP] [PP] [PP] [PP] [PP] [PP] 'saw' [NP] ]
E [ * [Det] [N] [PP] [PP] [PP] [PP] [PP] [PP] [PP] 'saw' [NP] ]
E [ * 'the' [N] [PP] [PP] [PP] [PP] [PP] [PP] [PP] 'saw' [NP] ]
E [ * 'a' [N] [PP] [PP] [PP] [PP] [PP] [PP] [PP] 'saw' [NP] ]
E [ * [NP] [PP] [PP] [PP] [PP] [PP] [PP] [PP] [PP] 'saw' [NP] ]
E [ * [Det] [N] [PP] [PP] [PP] [PP] [PP] [PP] [PP] [PP] 'saw' [NP] ]
E [ * 'the' [N] [PP] [PP] [PP] [PP] [PP] [PP] [PP] [PP] 'saw' [NP] ]
E [ * 'a' [N] [PP] [PP] [PP] [PP] [PP] [PP] [PP] [PP] 'saw' [NP] ]
E [ * [NP] [PP] [PP] [PP] [PP] [PP] [PP] [PP] [PP] [PP] 'saw' [NP] ]
E [ * [Det] [N] [PP] [PP] [PP] [PP] [PP] [PP] [PP] [PP] [PP] 'saw' [NP] ]
E [ * 'the' [N] [PP] [PP] [PP] [PP] [PP] [PP] [PP] [PP] [PP] 'saw' [NP] ]
E [ * 'a' [N] [PP] [PP] [PP] [PP] [PP] [PP] [PP] [PP] [PP] 'saw' [NP] ]
The problem then is the following rule:
np --> np pp.
The parser starts a search for np using this rule, which opens up the opportunity for another search for left daughter np, using this very same rule,. which opens up the opportunity, etcetera....
|   |
Under certain rather frequent circumstances a pure top down depth first parser will rediscover the same constituents over and over. This is because distinct parse paths can require searching for some of the same constituents in different places. |
|
| Example |   |
Consider the following mini-grammar, omitting lexical rules: rule(s,[np,vp]). rule(vp,[v,np,pp]). rule(vp,[v,np]). rule(pp,[p,np]). rule(np, [det, n]).Now consider the following input: John bought the book. We take up the story at the point where the parser begins looking for a VP. After parsing "John" as an NP, the parser will be in this state: [vp] [bought,the,book]It will try the first vp rule first: [v,np,pp] [bought,the,book]Sjince bought is indeed a verb, this will lead to: [np,pp] [the,book]Analyzing the book as an np will succeed. Which will lead to: [pp] []The problem now being that we can't parse the empty string as an np. So this parse path fails (rightfully) for this grammar. Next we backtrack and try the next vp rule: [v,np] [bought,the,book]Once again we recognize bought as a verb, going through exactly the same seqeunce of match operations This will lead once again to: [np] [the,book]Note we've been in this exact task as before before, buildin an np ut of the book. We dont remember that we've found an v and a np on a previous parse path.... This time we will succeed. Which will lead to: [] []Success. Problem: We're not remembering the pices we've built up along the way. Moral: Remember your successes. |
| The Problem |   |
The problem was because of these two rules rule(vp,[v,np,pp]). rule(vp,[v,np]).We picked the wrong one first (an unavoidable event, at least part of the time),, which took us down a false parse path. The problem is that some of the computing we did on that parse path, namely, finding the NP, the book, turned out to be useful on another parse path. Different parts of the search share subproblems. This is a common feature of many problems involving search. Different parts of the search spacve have identical subparts. The same subproblem may come up many times. The solution: Remember the solutions to subproblems when you find them. Provide an easy look-up mechanism. A data structure that enables a parser to remember its successes is the chart. Coming soon. |
Problem: Our formulation of the algorithm required binary branching rules. The CYK algorithm
The algorithm is designed to put together only 2 constituents at a time.
Obviously grammar with 3 daughters exist.
Solution: reccompile the garmmar:
vp -> v np ppbecomes
vp -> v np_pp np_pp -> np ppPossible but not ideal. Multiplies size of the grammar. Increases complexity of suyntax/se,mantics interface.
We list this problem with the top down problems above because it too can bne solved with the chart.