|
Linguistics 582 Syllabus |
|
Course Outline |
|
Linguistics 582 |
|
Day |
Reading |
Assignment |
Lecture |
Background |
Code |
| Tue Sep 02 | Chapter 24 of Jurafsky and Martin (J&M) 2ed, Section 24.1 and 24.2 | Assignment I(new revised link!) | Classical Systems. Philip Koehn's slides, my slides | ||
| Thu Sep 04 | Statistical Machine Translation (SMT). The noisy channel model. | ||||
| Tue Sep 09 | J&M 2ed, Section 24.3 Kevin Knight's Workbook (from the Johns Hopkins NSF Workshop MT Tutorial) | Statistical Machine Translation (ctd). Word, "phrase", and sentence alignment. Parallel corpora. A simple EM approach. | |||
| Thu Sep 11 | Alignment Editor "Understanding the decoder" (Slide 15/21) Web-based translation assignment (Slide 9 out of 37) | Alignment editor and Moses meeting in Comp Ling lab | A toy translation model (Moses website) | ||
| Tue Sep 16 | Philip Koehn's intro Koehn intro (ctd) | Assignment II(Slide 10 out of 37) | Statistical Machine Translation (ctd) Lab session. Data. Alignments. | Some links ACL 07 Workshop shared task | |
| Thu Sep 18 | Assignment III(Slide 12 out of 37) | Sentence alignment | Hunalign | sen_align-0.0:The sentence aligner and tokenizer Philipp Koehn distributes with Europarl | |
| Tue Sep 23 | Kevin Knight's Workbook | Word Alignment. Issues. Philip's slides: 2, my slides | Egypt: a statistical machine translation toolkit GIZA++ home page | aligner-1.0:An alignment editor for alignments in Moses format | |
| Thu Sep 25 | J&M 2ed, Section 24.5, 24.6.1 The IBM paper | IBM Models 1 and 2. Maximum likelihood estimation (MLE). EM algorithm. | |||
| Tue Sep 30 | Och Ney (2003) | IBM Models 3, 4, and 5. Fertility revisited. Model 3 | |||
| Thu Oct 02 | J&M 2ed, Section 24.5.2 Och Ney (2003) | Model 3: Details | Model 3: Implementation Model 3 pseudocode | ||
| Tue Oct 07 | Knight wkbook | Assignment: Assignment 4 | HMM based alignment models First-order models | ||
| Thu Oct 09 | Och (1999), Brown etal (1993) | Comparing models (bilingual classes) | Franz Och's mkcls web page | ||
| Tue Oct 14 | Och Ney (2003) Klein Manning (2002) | Comparing models (various) Combining models | |||
| Thu Oct 16 | Moses tutorial | Assignment: Implement IBM Model 2 Non-Programming Assignment: Compute the Model II probabilities for a dataset given out in class (the week before) |
Practicum: A full system | ACL 2007 Workshop on statistical MT Moses statistical Machine Translation System | |
| Tue Oct 21 | J&M 2ed, Section 24.8, Germann et al (2001). Germann (2003). Pharoah Manual, Beam-search, Ch. 3 (ps, pdf). | The decoder | |||
| Thu Oct 23 | WMT 2007 Baseline system directions | Practicum: Datasets and preparing data | aligner-0.0:The sentence aligner and tokenizer Philipp Koehn distributes with Europarl | ||
| Tue Oct 28 | J&M 2ed, Section 24.7 Koehn book extract: merging alignments Edinburgh MT system description (Sec. 2.3) | Alignment merging assignment | Lectures: Bidirectional alignments | ||
| Thu Oct 30 | Chapter 24 of Jurafsky and Martin (J&M) 2ed, Section 24.4 Pharoah Manual, History, Ch. 2 (ps, pdf). Kevin Knight's Workbook Edinburgh MT system description (Sec. 2.3) | Assignment: Phrase assignment | Phrase-based alignment, Helpful phrases.py | ||
| Tue Nov 04 | Och Ney 2004, Koehn, et al 2003, Comparison | Assignment: System building assignment | Phrase-Based alignment (ctd) | ||
| Thu Nov 06 | J&M 2ed, Section 24.8, Evaluating the MT Workshop 2007 shared task results, Papineni et al. Bleu, Kappa statistic METEOR FEMTI HTER | Assignment: Do some hand evaluation of data passed out in class. | Evaluation. Human and automatic methods. Word-Error rate. Reference matching (Bleu,METEOR). Translation Edit rate. | NIST MT Scoring | |
| Tue Nov 11 | H'day | H'day | H'day | H'day | H'day |
| Thu Nov 13 | Och (2002) Och (2003) | Assignment: Evaluate two competing systems, using automatic tools. | Ratnarkhi's gentle intro to max entropy modeling The 4 components of the Moses model Minimum error rate training. | A toy phrase-based model (Moses website) | |
| Tue Nov 18 | Syntax-based models | ||||
| Thu Nov 20 | Syntax-based models (ctd) | ||||
| Tue Nov 25 | Kilgarriff (97) Jurafsky & Martin, Chapters 19,20 (2dEd), Chapters 15,16, 17 (1st Ed.) | Max ent word sense disambiguation assignment | word sense disambiguation. Introduction. Maximum entropy models | ||
| Thu Nov 27 | H'day | H'day | H'day | H'day | H'day |
| Tue Dec 02 | Similarity. Lin (98a) Assorted related readings Intro to LSA. Landauer, Foltz, Laham (98) Pantel's committee-based clustering diss | Assignment: Use kappa statistic on results of hand annotation | Word meaning similarity measures. Dekang Lin's sim measure and LSI. Clustering. | ||
| Thu Dec 04 | Clustering, ctd. | ||||
| Tue Dec 09 | Assignment | Framenet's theory of word senses: an introduction. | |||
| Thu Dec 11 | Assignment: Use the annotation tool to annotate some English verbs according to the protocol given. | Annotating Framenet data. Last class day | Code: The annotater (Python source) | ||
| Tue Dec 16 |