What is Machine Translation?

 

Systems

* Egypt: a statistical machine translation toolkit
Website for The 1999 NSF Statistical MT Workshop at John Hopkins University. Includes software distributions for most of the tools built at the workshop, such as GIZA, which trains word and phrase alignment models.
* GIZA++
Franz Jozef Och's augmented version of GIZA, a practical word aligner with IBM models 1-5, word class based alignment, HMM alignment.
* ReWrite Decoder
Daniel Marcu and Ulrich Germann's greedy decoder
* Moses a statistical machine translation toolkit
Includes GIZA++, and a decoder, as well as excellent tutorials.
* Thot Builds phrase tables from Giza alignments
* Pharoah
Philipp Koehn's thesis system, a phrase-based decoder that builds phrase tables up from Giza word alignments
* Genpar: Toolkit for research on statistical machine translation by parsing
* Hunalign: Sentence aligner
* SRI Language Modeling system
Language modeling part of the base system in WMT 2007
* YASMET: Max entropy modeler in C++ ("Yet Another Maximum Entropy Toolkit", Franz Joseph Och), Companion feature selector in JAVA (Deepak Ravichan)
Implementation of A Maximum Entropy Approach to Natural Language Processing by [Berger, Pietra & Pietra 1996].
* Open NLP's maxent (Java)
* Rob Malouf's Toolkit for Advanced Discriminative Modeling
* Zhang Le's Maxent tool (with Python binding)

Resources

* Philipp's Koehn's Statistical MT Page
* John Hutchins's MT archive
* Franz Josef Och's Old Home Page , the New Home Page at Google
Links to software and paper by a leading figure in the field, known among other things, as the creator of GIZA.
* HLT/NAACL 2003 workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond.
* Rada Mihalcea's home page
Links to sense-tagged datasets, sense-tagging tools
* Zhang Le's Max ent page
* Ratnaparkhi's home page
Links to Max ent papers
* Berger's Max Entropy tutorial
* Ed T. Jaynes
* ACL Statistical Semantics page http://wiki.delph-in.net/moin/FrontPage

Parallel Corpora

* ISI Hansard Data
* Europarl corpus
* WMT 2007 Shared Task Data (396 MB parallel, 276 MB lm)
Europarl data again, plus news commentary.
* Acquis Communitaire
Possibly the largest existing parallel corpus. European Union Law in 22 languages
* Czech English Parallel Corpus
* Hungarian-English corpus (Hunglish)

Word Sense Links

* Wikipedia article
* Senseval Page
* Word sense disambiguation: Algorithms and Applications
Agirre, Eneko and Edmonds, Philip (eds).Springer
* Computer Recognition of English Word Senses
Kelly, Edward F., and Stone, Philip J. (1975), Amsterdam: North-Holland. ISBN 0-444-10831-9

Word Sense Software

* Ted Pedersen's WSD shell
Supervised Word sense learning system Sensetools
Required for WSD shell WEKA Data Mining Suite
Required for WSD shell
* Senserelate
* Clustering By Committee (CBC)
* Ted Pedersen's SenseClusters (Perl)

Similarity and LSA

* Latent Semantics Analysis @CU Boulder
Useful site with bibs, demos, links (no software)
* WordNet word similarity measures ( Perl , Python, Java)
Package implementing a variety of word semantic similarity measures
* Semantics Vectors Package
Uses Random Projection (RD) rather than Singular Value Decomposition (SVD)
* SVD pack (ANSI Fortran-77, also ANSI-C)
* University of Tenessee LSA site
* SenseClusters (Perl)
Includes LSA methods as well as SenseClusters

Semantically annotated data

* Senseval 3
* Senseval-3 tasks
* Senseval-3 Data
* Semcor
Word-sense tagged English data (Various versions of WordNet format)
* Ted Pedersen's Sense-Tagged text page
Links to a bunch of data sets in various formats; format translation scripts as well.

Computational Linguistics departments and programs

SDSU Computational Linguistics Program
Cornell Dept of Modern Languages and Linguistics
Edinburgh Linguistics Dept and Cognitive Science Dept
Groningen BCN Linguistics
University of Leuven Center for Computational Linguistics
University of Melbourne
MIT Linguistics Dept (including MIT Working Papers in Linguistics publications information)
New York University Linguistics Department
SOAS (School of Oriental and African Studies), University of London (including their working papers)
Stanford Linguistics Dept
University of Sydney Department of Linguistics
University of Stuttgart Institut für Maschinelle Sprachverarbeitung (Institute of Natural Language Processing)
Yale Linguistics Dept

Idiosyncratic listing of Computational Linguistic Companies

* Google
* Microsoft NLP
* Nuance "Workflow" system
Turn speech into words.
* Lucent Bell Labs Text to Speech system demo
Turn words into speech (in English)! Or into French.
* ELF (English Language Frontend) for MS Access/VB
* Translation Experts Machine Translation on the web
* Text Analysis Intl: Text Analysis framework (IE-style)

What is linguistics?

* iLoveLanguages
Links to all the world's languages. Successor to the famous "The Human-Languages Page" ( http://www.june29.com/HLP/), by the same author, Tyler Chambers.
* Yamada Language Guides
Like the page above, a compnedium of information different languages. Lots of fonts.
* Yahoo's Human Languages and Linguistics page
* YourDictionary.com/Robert Beard
* Kevin Russell's Phonetics intro site
* X-ray film of the vocal tract
* Ligações de Línguas e Linguística
Language and linguistics links in Portuguese, Galician, and English.
* Linguistics and Natural Language Processing (Rick Wojcik's).
Good on research centers and Russian. -->
* British Library samples of English dialects
* English accents and reactions thereto
* A list of some software for linguistics from Johanna Rubba. A more complete and organized list would be even better.... -->
* Listings of software for creating and managing linguistic annotations and for linguistic exploration by Steven Bird.
* The Syntax Student's Companion
Java application for editing and checking phrase structure trees.
* Internet Grammar of English
* The Interactive Introduction to Linguistics
* Annotated list of resources on statistical and corpus-based computational linguistics
* The Language and Gender Page
* The Virtual CALL (Computer-Assisted Language Learning) Library
* Interactive online computational linguistics demos listing
* Ethnologue: Languages of the World, 13th Edition, 1996
A great overview guide to the world's languages. There's also a search interface.