Parallel Corpus refinement for statistical Machine Translation

The most important activity in natural language processing is translating from natural language to another natural language. To this end, different branches have been produced in natural language processing. One of the solutions, which are notified for improving translation, is improving word alignment.

Word Alignment is defined as an object for representing the corresponding words in parallel text. In line with word alignment process can be considered multiword units alignment process. This process is called phrase-level alignment which is including word-level alignment.

Alignment of multi word units can be lead to improve translation quality. Also, they cause to translate expressions in each side in the best way.

For improving alignment some features are defined which use them in log-linear-model and bootstrapping approach added phrase pairs with high alignment probability. Finally, obtained phrase table gives better BLEU than Giza’s phrase table. In other words, obtained phrase table improves translation quality of SMT.

Project Provider

Leila Tavakoli
Email: leila.tavakoli@ut.ac.ir
Former Member

Supervisor Professor

Heshaam Faili
Associate Professor
Tel Number: 61119717
Email: hfaili [AT] ut.ac.ir