Linguistics 582 Syllabus

Course Outline

Linguistics 582

Day

Reading

Assignment

Lecture

Background

Code

Tue Sep 02 Chapter 24 of Jurafsky and Martin (J&M) 2ed, Section 24.1 and 24.2 Assignment I(new revised link!) Classical Systems. Philip Koehn's slides, my slides    
Thu Sep 04     Statistical Machine Translation (SMT). The noisy channel model.    
Tue Sep 09 J&M 2ed, Section 24.3 Kevin Knight's Workbook (from the Johns Hopkins NSF Workshop MT Tutorial)   Statistical Machine Translation (ctd). Word, "phrase", and sentence alignment. Parallel corpora. A simple EM approach.    
Thu Sep 11   Alignment Editor "Understanding the decoder" (Slide 15/21) Web-based translation assignment (Slide 9 out of 37) Alignment editor and Moses meeting in Comp Ling lab A toy translation model (Moses website)  
Tue Sep 16 Philip Koehn's intro Koehn intro (ctd) Assignment II(Slide 10 out of 37) Statistical Machine Translation (ctd) Lab session. Data. Alignments. Some links ACL 07 Workshop shared task  
Thu Sep 18   Assignment III(Slide 12 out of 37) Sentence alignment Hunalign sen_align-0.0:The sentence aligner and tokenizer Philipp Koehn distributes with Europarl
Tue Sep 23 Kevin Knight's Workbook   Word Alignment. Issues. Philip's slides: 2, my slides Egypt: a statistical machine translation toolkit GIZA++ home page aligner-1.0:An alignment editor for alignments in Moses format
Thu Sep 25 J&M 2ed, Section 24.5, 24.6.1 The IBM paper   IBM Models 1 and 2. Maximum likelihood estimation (MLE). EM algorithm.    
Tue Sep 30 Och Ney (2003)   IBM Models 3, 4, and 5. Fertility revisited. Model 3    
Thu Oct 02 J&M 2ed, Section 24.5.2 Och Ney (2003)   Model 3: Details Model 3: Implementation Model 3 pseudocode  
Tue Oct 07 Knight wkbook Assignment: Assignment 4 HMM based alignment models First-order models    
Thu Oct 09 Och (1999), Brown etal (1993)   Comparing models (bilingual classes) Franz Och's mkcls web page  
Tue Oct 14 Och Ney (2003) Klein Manning (2002)   Comparing models (various) Combining models    
Thu Oct 16 Moses tutorial Assignment: Implement IBM Model 2

Non-Programming Assignment: Compute the Model II probabilities for a dataset given out in class (the week before)

Practicum: A full system ACL 2007 Workshop on statistical MT Moses statistical Machine Translation System  
Tue Oct 21 J&M 2ed, Section 24.8, Germann et al (2001). Germann (2003). Pharoah Manual, Beam-search, Ch. 3 (ps, pdf).   The decoder    
Thu Oct 23 WMT 2007 Baseline system directions   Practicum: Datasets and preparing data   aligner-0.0:The sentence aligner and tokenizer Philipp Koehn distributes with Europarl
Tue Oct 28 J&M 2ed, Section 24.7 Koehn book extract: merging alignments Edinburgh MT system description (Sec. 2.3) Alignment merging assignment Lectures: Bidirectional alignments    
Thu Oct 30 Chapter 24 of Jurafsky and Martin (J&M) 2ed, Section 24.4 Pharoah Manual, History, Ch. 2 (ps, pdf). Kevin Knight's Workbook Edinburgh MT system description (Sec. 2.3) Assignment: Phrase assignment Phrase-based alignment, Helpful phrases.py    
Tue Nov 04 Och Ney 2004, Koehn, et al 2003, Comparison Assignment: System building assignment Phrase-Based alignment (ctd)    
Thu Nov 06 J&M 2ed, Section 24.8, Evaluating the MT Workshop 2007 shared task results, Papineni et al. Bleu, Kappa statistic METEOR FEMTI HTER Assignment: Do some hand evaluation of data passed out in class. Evaluation. Human and automatic methods. Word-Error rate. Reference matching (Bleu,METEOR). Translation Edit rate. NIST MT Scoring  
Tue Nov 11 H'day H'day H'day H'day H'day
Thu Nov 13 Och (2002) Och (2003) Assignment: Evaluate two competing systems, using automatic tools. Ratnarkhi's gentle intro to max entropy modeling The 4 components of the Moses model Minimum error rate training. A toy phrase-based model (Moses website)  
Tue Nov 18     Syntax-based models    
Thu Nov 20     Syntax-based models (ctd)    
Tue Nov 25 Kilgarriff (97) Jurafsky & Martin, Chapters 19,20 (2dEd), Chapters 15,16, 17 (1st Ed.) Max ent word sense disambiguation assignment word sense disambiguation. Introduction. Maximum entropy models    
Thu Nov 27 H'day H'day H'day H'day H'day
Tue Dec 02 Similarity. Lin (98a) Assorted related readings Intro to LSA. Landauer, Foltz, Laham (98) Pantel's committee-based clustering diss Assignment: Use kappa statistic on results of hand annotation Word meaning similarity measures. Dekang Lin's sim measure and LSI. Clustering.    
Thu Dec 04     Clustering, ctd.    
Tue Dec 09   Assignment Framenet's theory of word senses: an introduction.    
Thu Dec 11   Assignment: Use the annotation tool to annotate some English verbs according to the protocol given. Annotating Framenet data. Last class day   Code: The annotater (Python source)
Tue Dec 16