Asset Publisher Asset Publisher

Return to Full Page

Alireza Mahmoodi

Alireza Mahmoodi


Alireza Mahmoodi

master

 

Translation from or into a morphologically rich language, where a word stem appears in many completely diferent surface forms, comes with many challenges. This problem becomes severe in translation of verbs where changing some of the sentence components such as the subject of a verb, the verb form accordingly changes. So, morphological analysis is an essential process in the translation from or into such languages. This phenomen can be partially dominated in a rule-based machine translation (RBMT) using a large number of rules and states. But generally, statistical machine translation (SMT) that uses word-based or phrase-based approach is not able to translate corresponding source side word into a correct one in a diferent situation. In this research, we use a statistical and a rule-based English-Persian machine translation as two baselines for our approach. First, we analyze the output of RBMT to categorize the errors. Based on the results of our error analysis, the verbs are known as a highly inflecting class of words and an important part of morphological processing in Persian. Morphological features which we predict and use to generate the inflected verb form are voice, mood, number, tense, negation and person. So, we focus on the verb to improve English-Persian MT. We use a novel approach to statistically rich morphology generation for Persian verbs and compare this technique with our two baselines. We predict morphological features of the verbs using our statistical model incorporating decision tree classifier (DTC). In order to train DTC, we use an English-Persian parallel corpus, which contains linguistic information such as syntactic parse tree and dependency relations on English side, as well as linguistic information to generate an appropriate inflected Persian verb on Persian side. The results of our experiments show that our model outperforms baselines in term of prediction accuracy. By appliying our proposed approach to the baselines, some improvements up to 2.6 in terms of BLEU score are acquired. Keywords: Machine Translation, Rule-based MT, SMT, Morphology, Decision Tree, Natural language processing, Parallel Corpus.