|
|
 |
Mohammad Taher Pilevar
MS Student of Artificial Intelligence
E-mail: t.pilevar (at) ece (dot) ut.ac.ir
Project: Statistical Machine Translation |
Machine translation, sometimes referred to by the abbreviation MT, is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. MT is supposed to perform more than the simple substitution of words in one natural language for words in another. Using corpus techniques, more complex translations are attempted, allowing for better handling of differences in linguistic typology, phrase recognition, and translation of idioms, as well as the isolation of anomalies. Statistical machine translation (SMT) is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. Where such corpora are available, impressive results can be achieved translating texts of a similar kind, but such corpora are still very rare. Therefore, a substantial phase in building English-to-Persian SMT is to create a multi-million token bilingual
corpus.
|
|
In the field of computational linguistics, word sense disambiguation (WSD) is the process of identifying which sense of a word is used in any given sentence, when the word has a number of distinct senses.
In my project I want to choose the best translation (sense) of each word in sentence based on the corpus statistical information such as mutual information. The method in my project use graph theory to compare the relation between the words. In each sentence we have some words which may have some meaning (translation of the word) in target language. We extract this meaning from a bilingual dictionary and calculate the weight between each pair of these meanings. Finally we select the meaning that has larger weight. Here source language is English and the target language is Farsi.
 |
 |
Morteza Montazery
MS Student of Software Engineering
mortaza.gh (at) gmail (dot) com
Project: Automatic Persian WordNet Construction |
In recent years, there has been an increasing interest in semantic processing of natural languages. Some of the essential resources to make this kind of process possible are semantic lexicons and ontologies. WordNet is a lexical database for the English language. It groups English words into sets of synonyms called synsets, provides short, general definitions, and records the various semantic relations between these synonym sets. Although WordNets have been developed for a number of languages, no attempts to construct a Persian WordNet have been known to exist. This project try to construct the Persian WordNet automatically.
 |
 |
Nava Ehsan
MS Student of Software Engineering
E-mail: nava.ehsan (at) gmail (dot) com
Project: Persian spell and grammar checker |
The aim of this project is to develop an automatic spell and grammar checker for the Persian (Farsi) language. The spell and grammar checker described in this thesis takes a text and returns a list of possible errors. The different kind of errors can be categorized to spelling errors, grammatical errors and real word errors. The most popular method of detecting spelling errors in a text is simply to look up every word in a dictionary; any words that are not there are taken to be errors. Correcting the errors needs designing similarity algorithm based on Persian error probabilities and also n-gram rules. Grammatical errors are described as wrong relation between words like subject-verb disagreement or wrong sequence of words like using adjective before noun. To detect grammatical errors, each word of the text is assigned its part-of-speech tag. Then the text is matched against the checker’s pre-defined error rules. The rules describe errors as patterns of words, part-of-speech tags and in some cases chunks. Real word errors are misspelled words that result in valid words in the language. This project will also use a statistical method to detect real word errors that are not detected by grammatical rules.
|
|
|