Linguistics 681
Statistical Methods in Computational Linguistics
Required Texts
Manning, C. and Schuetze, H. 2000. Foundations of Satistical Natural Language Processing.
Charniak, E. 1998. Statistical Language Learning. MIT Press.
Reading packet.
Course Description
This is a survey of statistical methods in computational linguistics that explores some of the motivations for and alternatives to statistical techniques covered in the Introduction to Computational Linguistics I and II. Topics covered include Markov chains and Hidden Markov Models, statistical estimators for n-gram models, finding collocations and subcategorization frames, collecting selectional preferences, part-of-speech tagging, word sense disambiguation, and probabilistic context-free grammars.
Grading
Assignments(40%)
Midterm (20%)
Final(40%)
Week 1:
Review of Probability theory
Week 2:
Introduction to Information Theory.
Week 3:
Review of n-gram models and data sparseness. Maximum Likelihood estimation for n-gram models.
Week 4:
Smoothing methods. Linear interpolation. Backoff.
Week 5: Markov chains and Hidden Markov models (HMMs).
Week 6: Application of HMMS to trigrams and part-of-speech tagging. Viterbi search.
Week 7: HMM training. Forward-backward algorithm.
Week 8, 9 and 10:
Probabilistic context-free grammars and lexicalized probabilistic grammars.
Week 11
Statistical alignment of bilingual corpora
Week 12
Authorship attribution. The case of the Federalist papers.
Week 13
Word-clustering.Week 14,15
Information retrieval. Vector space model. Latent semantic indexing.