The basic model

The probability cost that is assigned to a translation is a product of probability costs of four models:

Each of these models contributes information over one aspect of the characteristics of a good translation:

Each of the components can be given a weight that sets its importance. Mathematically, the cost of translation is:

p(e|f) = phi(f|e)^weight_phi * LM(e)^weight_lm * D(e,f)^weight_d * W(e)^weight_w

The propability p(e|f) of the English translation e given the foreign input f is broken up into four models, phrase translation phi(f|e), language model LM(e), distortion model D(e,f), and word penalty W(e) = exp(length(e)). Each of the four model is weighted by a weight.

The weighting is provided to the decoder with the four parameters weight-t, weight-l, weight-d, and weight-w. The default setting for these weights are 1, 1, 1, and 0. These are also the values in the configuration file moses.ini.

The key to good translation performance is having a good phrase translation table. But some tuning can be done with the decoder. The most important is the tuning of the model parameters.

Setting these weights to the right values can improve translation quality.

The tutorial gives one example. When translating the German sentence ein haus ist das, the distortion weight was set to 0:

 % echo 'ein haus ist das' | moses -f moses.ini -d 0
 this is a house

With the default weights, the translation comes out wrong:

 % echo 'ein haus ist das' | moses -f moses.ini
 a house is the

What is the right weight setting depends on the corpus and the language pair. Usually, a held out development set is used to optimize the parameter settings. The simplest method here is to try out with a large number of possible settings, and pick what works best. Good values for the weights for phrase translation table (weight-t, short tm), language model (weight-l, short lm), and reordering model (weight-d, short d) are 0.1-1, good values for the word penalty (weight-w, short w) are -3-3. Negative values for the word penalty favor longer output, positive values favor shorter output.

Phrase translation table

Here is a toy phrase model from the moses tutorial:

der ||| the ||| 0.3
das ||| the ||| 0.4
das ||| it ||| 0.1
das ||| this ||| 0.1
die ||| the ||| 0.3
ist ||| is ||| 1.0
ist ||| 's ||| 1.0
das ist ||| it is ||| 0.2
das ist ||| this is ||| 0.8
es ist ||| it is ||| 0.8
es ist ||| this is ||| 0.2
ein ||| a ||| 1.0
ein ||| an ||| 1.0
klein ||| small ||| 0.8
klein ||| little ||| 0.8
kleines ||| small ||| 0.2
kleines ||| little ||| 0.2
haus ||| house ||| 1.0
alt ||| old ||| 0.8
altes ||| old ||| 0.2
gibt ||| gives ||| 1.0
es gibt ||| there is ||| 1.0