Asset Publisher Asset Publisher

Return to Full Page

Masoud Jalili Sabet

Masoud Jalili Sabet


 Masoud Jalili Sabet

master

Proposing a Translation Model using Spectral Embedding of Lexicon

Recent researches on word embedding are showing improvements in capturing semantic features of the words. These vector representations could be used for measuring similarities between words in one or more languages and this comparison is useful in many tasks in the Natural Language Processing field. Statistical machine translation systems usually break the translation task into two or more subtasks and an important one is finding word alignments over a sentence-aligned bilingual corpus. One of the common approaches to find word alignment, uses a generative translation model that produces a sentence in one language given the sentence in another language. Since the word alignment task requires a form of statistics or comparison between words from the source and the target languages, a good translation model can result in better alignments. Building a good translation model requires large amount of sentence-aligned parallel data which is not available for most of the language pairs. In this project we investigate the usage of word embeddings in creating the translation models. We approach the task by using monolingual and bilingual vector representations of words in creating new translation models. The translation model created by bilingual word embeddings is used for scoring translations in the translation memory (TM) cleaning task. In the next step, using the monolingual word embeddings, we propose a new alignment model based on IBM model 1. This model addresses the problem of low-resource word alignment and the quality of alignments for rare words. The results of experiments indicate the effectiveness of the proposed model in the TM cleaning task. The new alignment model has better performance in small datasets. It also has better performance in aligning rare words compared to the baseline. Keywords: Translation model, Machine Translation, Word Embedding, Rare words, Word Alignment, Translation Memory