What is Machine Translation?
Systems
Egypt:
a statistical machine translation toolkit
-
Website for The 1999 NSF Statistical MT
Workshop at John Hopkins University. Includes software
distributions for most of the tools built at the workshop,
such as GIZA, which trains word and phrase alignment models.
GIZA++
-
Franz Jozef Och's augmented version of GIZA, a practical word
aligner with IBM models 1-5, word class based alignment,
HMM alignment.
-
ReWrite Decoder
-
Daniel Marcu and Ulrich Germann's greedy decoder
Moses
a statistical machine translation toolkit
-
Includes GIZA++, and a decoder, as well as excellent tutorials.
Thot
Builds phrase tables from Giza alignments
Pharoah
-
Philipp Koehn's thesis system, a phrase-based decoder
that builds phrase tables up from Giza word alignments
-
Genpar: Toolkit
for research on statistical machine translation
by parsing
-
Hunalign: Sentence aligner
-
SRI Language Modeling system
-
Language modeling part of the base system in WMT 2007
-
YASMET: Max entropy
modeler in C++ ("Yet Another Maximum Entropy Toolkit", Franz Joseph Och), Companion feature selector in JAVA (Deepak Ravichan)
-
Implementation of A Maximum Entropy Approach to Natural Language Processing by [Berger, Pietra & Pietra 1996].
-
Open NLP's maxent (Java)
-
Rob Malouf's Toolkit for Advanced Discriminative Modeling
-
Zhang Le's Maxent tool (with Python binding)
Resources
-
Philipp's Koehn's Statistical
MT Page
-
John Hutchins's
MT archive
-
Franz Josef Och's Old Home Page ,
the New Home Page at Google
-
Links to software and paper by a leading figure in the field,
known among other things, as the creator of GIZA.
-
HLT/NAACL 2003 workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond.
-
Rada Mihalcea's home page
-
Links to sense-tagged datasets, sense-tagging tools
-
Zhang Le's Max ent page
-
Ratnaparkhi's home page
-
Links to Max ent papers
-
Berger's Max Entropy tutorial
-
Ed T. Jaynes
ACL Statistical Semantics page
http://wiki.delph-in.net/moin/FrontPage
-
ISI Hansard Data
-
Europarl corpus
-
WMT 2007 Shared Task Data (396 MB parallel, 276 MB lm)
-
Europarl data again, plus news commentary.
-
Acquis Communitaire
-
Possibly the largest existing parallel corpus. European Union Law
in 22 languages
-
Czech English Parallel Corpus
-
Hungarian-English corpus (Hunglish)
Word Sense Links
-
Wikipedia article
-
Senseval Page
-
Word sense disambiguation: Algorithms
and Applications
-
Agirre, Eneko and Edmonds, Philip (eds).Springer
-
Computer Recognition of English Word Senses
-
Kelly, Edward F., and Stone, Philip J. (1975), Amsterdam:
North-Holland. ISBN 0-444-10831-9
Word Sense Software
-
Ted Pedersen's WSD shell
-
Supervised Word sense learning system
Sensetools
-
Required for WSD shell
WEKA Data Mining Suite
-
Required for WSD shell
-
Senserelate
-
Clustering By Committee (CBC)
-
Ted Pedersen's SenseClusters (Perl)
Similarity and LSA
-
Latent Semantics Analysis @CU Boulder
- Useful site with bibs, demos, links (no software)
-
WordNet word similarity measures
( Perl ,
Python,
Java)
- Package implementing a variety of word semantic similarity measures
-
Semantics Vectors
Package
-
Uses Random Projection (RD) rather than Singular Value Decomposition (SVD)
-
SVD pack
(ANSI Fortran-77, also ANSI-C)
-
University of Tenessee LSA site
-
SenseClusters (Perl)
- Includes LSA methods as well as SenseClusters
Semantically annotated data
-
Senseval 3
-
Senseval-3 tasks
-
Senseval-3 Data
-
Semcor
-
Word-sense tagged English data (Various versions of WordNet format)
Ted Pedersen's Sense-Tagged text page
Links to a bunch of data sets in various formats; format translation
scripts as well.
Computational Linguistics departments and programs
-
SDSU Computational Linguistics Program
- Cornell Dept of Modern Languages
and Linguistics
- Edinburgh Linguistics Dept and
Cognitive Science Dept
- Groningen BCN
Linguistics
- University of
Leuven Center for Computational Linguistics
- University of
Melbourne
- MIT Linguistics
Dept (including MIT
Working Papers in Linguistics publications information)
- New York University
Linguistics Department
- SOAS (School of Oriental and
African Studies), University of London (including their working papers)
- Stanford Linguistics Dept
- University of Sydney
Department of Linguistics
- University of Stuttgart
Institut für Maschinelle Sprachverarbeitung
(Institute of Natural Language Processing)
- Yale Linguistics Dept
Idiosyncratic listing of Computational Linguistic Companies
Google
Microsoft NLP
Nuance "Workflow" system
- Turn speech into words.
Lucent Bell Labs Text to
Speech system demo
- Turn words into speech (in English)! Or
into French.
ELF (English Language
Frontend) for MS Access/VB
Translation Experts Machine
Translation on the web
Text Analysis Intl: Text
Analysis framework (IE-style)
What is linguistics?
iLoveLanguages
- Links to all the world's languages.
Successor to the famous "The Human-Languages
Page" (né
http://www.june29.com/HLP/), by
the same author, Tyler Chambers.
Yamada Language
Guides
- Like the page above, a compnedium of information
different languages. Lots of fonts.
Yahoo's Human Languages and Linguistics page
YourDictionary.com/Robert Beard
Kevin
Russell's Phonetics intro site
X-ray film of
the vocal tract
Ligações
de Línguas e Linguística
- Language and linguistics links in Portuguese, Galician, and English.
Linguistics and
Natural Language Processing (Rick Wojcik's).
- Good on research centers and Russian.
-->
British
Library samples of English dialects
English
accents and reactions thereto
A
list of some software for linguistics from Johanna Rubba. A more
complete and organized list would be even better....
-->
Listings of software
for creating
and managing linguistic annotations and for
linguistic
exploration by Steven Bird.
The Syntax
Student's Companion
- Java application for editing and checking phrase structure trees.
Internet Grammar of
English
The
Interactive Introduction to Linguistics
Annotated list of resources on statistical and
corpus-based computational linguistics
The
Language and Gender Page
The Virtual CALL
(Computer-Assisted Language Learning) Library
Interactive
online computational linguistics demos listing
Ethnologue: Languages of the
World, 13th Edition, 1996
- A great overview guide to the world's languages. There's also a
search interface.