San Diego State University logo

Department of Linguistics and Oriental Languages

Contents

Goals

Required Text

Course outline

Prequisites

Grading

Place and Time

Course home

Contact Info


Sites

Textbook site

NLTK Help

Python Wiki

More Python Help

Stat Textbook

Statistical NLP Site

ACL Wiki

General NLP Site

Lang Codes

Readings

Unix Tools

Computational Linguistics

Computational Syntax and Semantics Syllabus

Linguistics 582


Goals

We will begin with a review of some classic all paths parsing algorithms, topdown and bottum-up, introducing Chart parsing as a framework in which various approaches to parsing can be understood and implemented. The Natural Language toolkit (NLTK) will provide us with a rich set of reference implementations. Python, the language NLTK is written in, will be the programming language of choice.

We will also introduce unification, a general framework for introducing arbitrary constraints into the parser, and eyeball some of the practical consequences of introducing this computationally expensive mechanism into the parsing process.

Finally we will look at an NLTK implementation of chunk parsing, a form of shallow parsing and apply it to a simple information extraction task.

Back to top.

Required Text

The text for the class will be Jurafsky and Martin, Speech and Natural Language Processing.

Back to top.

Course outline

WEEK
ONE
Aug 29
  Lecture

Read: Jurafsky and Martin: Chapter 10.

Basic recursive descent recognizer and parser.

Introduction to NLTK.

Assignment
Due Tuesday
Recursive Descent Assignment.
WEEK
TWO
Sept 5
 

Lecture

Pseudocode for top down recognizer.

A simplified Python top down recognizer.

Jurafsky and Martin, Chapter 2. Focusing on the recognizer algorithms for deterministic and non-deterministic FSAs. Lecture.

Assignment
[Due Tuesday]

Top down look ahead assignment.

WEEK
THREE
Sept 12
 

Lecture

Bottum-up parsing. A shift-reduce parser. Backtracking again. Deterministic parsers versus all-paths parsers. Lecture

Assignment

Answer to week two homework assignment (hw_depth_first_rd_parser). The revised version of cfg.py that hw_depth_first_rd_parser.py requires. It is imported with

from new import cfg

Shift reduce assignment.

WEEK
FOUR
Sept 19
 

Lecture

Problems with top down parsing. Left recursion and rediscovered constituents The case of The man with a telescope saw a dog. is nltk_lite.parse.depth_first_rd_parser.

Topdown parsing with the chart. Solving the problems. Lecture.

Assignment

Questions for discussion.

WEEK
FIVE
Sept 26
 

Lecture

Jurafsky and Martin, Chapter 10. The Earley Algorithm. The chart. Dynamic programming. Lecture Where did the backtracking go? Breadth-first search + Chart = incrementality.

The chart as a parsing formalism. Top down, bottum up, and Earley parsing using the chart. Rule-based chart parsing. Chart parsing summary.

Assignment

Solution to Earley parsing assignment.

Caution: nltk.parser.chart has two versions of Earley's algorithm, as explained in lecture. Use the earley parser class (rather than general chart class) for efficiency.

WEEK
SIX
Oct 3
 

Lecture

Probablistic CFGs. Jurafsky and Martin, Chapter 12.

Lecture.

Assignment

Assignment.

 

WEEK
SEVEN
Oct 10
 

Lecture

??

Assignment

TBA.

WEEK
EIGHT
Oct 17
 

Lecture

TBA

Assignment

Midterm.

WEEK
NINE
Oct 24
 

Lecture

Parseval. Lecture.

Assignment

??

WEEK
TEN
Oct 31
 

Lecture

[chunk parsing moved to later] Unification!

Unification. Lecture. Unification lattice Lecture.

Assignment

Earley assignment.

WEEK
ELEVEN
Nov 7
 

Lecture

Jurafsky and Martin, Chapter 11.

Unification. Lecture. Grammar I. Agreement and subcategorization.

Assignment

Earley and CKY parsers for this assignment: gzipped tarfile.

Earley and CKY parsers for this assignment: README.

Assignment .

WEEK
TWELVE
Nov 14
 

Lecture

Feature structures in NLTK.

Implementing Unification. nltk.featurestructure.

Diagrams

Assignment

Experiment with the NLTK feature structure module.

NLTK tutorial notes (from NLTK site). The file feat0.cfg. Notes on NLTK notes. NLTK lite featurestructure module.

Unification parsing

Summary of Earley efficiency and termination issues. Shieber's restriction solution. Unification parsing and restriction.

Earley with a large unification grammar. Efficiency issues.

WEEK
THIRTEEN
Nov 21
  Lecture

Chunk directory

Assignment

Chunk parsing assignment

WEEK
FOURTEEN
Nov 28
 

Lecture

Left-corner parsers. The Gemini algorithm. Dowding Moore 1991

Assignment TBA.
WEEK
FIFTEEN
Dec 5
 

Lecture

Final.

Assignment

Your final [that's right! This is it! No Kidding!}:

WEEK
EIGHTEEN
Dec 28
 

Lecture

Review. Summary. Whence NLP?

Assignment

Take NLP a little further than you found it.

Back to top.

Prerequisites and Grading

Prequisite: Some computer science or some linguistics; preferably Ling 581.

Grading will be based on exercises/projects a take-home midterm and final.

    Midterm 30 %
    Final 30 %
    Homework 40 %

Back to top.

Place and Time

Tu Th 11:00-12:30
SH-238

Website

http://www-rohan.sdsu.edu/~gawron/parsing

Contact Info

Mailing address:
Jean Mark Gawron
Department of Linguistics and Oriental Languages
San Diego State University
5500 Campanile Drive
San Diego, CA 92182-7727
Telephone: (619) 594-0252
Office Hours: Tu Th 16:00-17:30, BAM 321

Back to top.


Unix | Computational Linguistics Lab