|
|
Dr. Heshaam Faili
Assistant Professor, Department of Electrical and Computer Engineering, University of Tehran
E-mail: hfaili (at) ut.ac.ir
 |
Morteza Montazery
MS Student of Software Engineering
mortaza.gh (at) gmail (dot) com
Project: Automatic Persian WordNet Construction |
In recent years, there has been an increasing interest in semantic processing of natural languages. Some of the essential resources to make this kind of process possible are semantic lexicons and ontologies. WordNet is a lexical database for the English language. It groups English words into sets of synonyms called synsets, provides short, general definitions, and records the various semantic relations between these synonym sets. Although WordNets have been developed for a number of languages, no attempts to construct a Persian WordNet have been known to exist. This project try to construct the Persian WordNet automatically.
 |
 |
Nava Ehsan
MS Student of Software Engineering
nava.ehsan (at) gmail (dot) com
Project: Persian spell and grammar checker |
The aim of this project is to develop an automatic spell and grammar checker for Persian (Farsi) language. The spell and grammar checker described in this thesis takes a text and returns a list of possible errors. The different kind of errors can be categorized to spelling errors, grammatical errors and real word errors. The most popular method of detecting spelling errors in a text is simply to look up every word in a dictionary; any words that are not there are taken to be errors. Correcting the errors needs designing similarity algorithms based on Persian error probabilities and also n-gram rules. Grammatical errors are described as wrong relation between words like subject-verb disagreement or wrong sequence of words like using adjective before noun. To detect grammatical errors, each word of the text is assigned its part-of-speech tag. Then the text is matched against the checker’s pre-defined error rules. The rules describe errors as patterns of words, part-of-speech tags and in some cases chunks. Real word errors are misspelled words that result in valid words in the language. This project will also use statistical methods to detect real word errors that are not detected by grammatical rules.
 |
 |
Amin Mansoouri
MS Student of Software Engineering
a.mansouri (at) ece.ut.ac.ir
Project: Statistical Machine Translation |
Machine translation, sometimes referred to by the abbreviation MT, is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. MT is supposed to perform more than the simple substitution of words in one natural language for words in another. Using corpus techniques, more complex translations are attempted, allowing for better handling of differences in linguistic typology, phrase recognition, and translation of idioms, as well as the isolation of anomalies. Statistical machine translation (SMT) is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. Where such corpora are available, impressive results can be achieved translating texts of a similar kind, but such corpora are still very rare. Therefore, a substantial phase in building English-to-Persian SMT is to create a multi-million token bilingual corpus.
 |
 |
Parisa Saeedi
MS Student of Software Engineering
p.saeedy (at) ece.ut.ac.ir
Project: Unsupervised semantic role labeling in Persian |
Semantic role labeling, the computational identification and labeling of arguments in text,has become a leading task in computational linguistics today. Although the issues for this task have been studied for decades, the availability of large resources and the development of statistical machine learning methods have heightened the amount of effort in this field. supervised methods are based on the manually created resource are now face to some problems degraded performance in out-of-domain data and lake of this resource for some languages such Persian that have not these huge resources. So extending some approaches that are not so dependent to these resources are needed. We try to extend such approach that without any same annotated data learn extracting the predicate-argument structure from Persian sentences and assigned them with semantic roles. This task have many applications in Question answering systems ,information extraction ,machine translation and same application that text understanding is necessary.
 |
 |
Ali Basirat
MS Student of Artificial Intelligence
ali_basirat_ep (at) yahoo.com
Project: A statistical approach to produce parse tree and dependency tree using partial parsers |
Grammars as mathematical systems are used for modeling constituent structures of natural languages. And parsing is a searching strategy looking for all linguistic descriptions available in a grammar for a given sentence. Depend on the grammar coverage, parsing can be done completely or partially. Traditional parsers by making the assumption that the grammar they have is complete, search entire space of parse defined by the grammar and looks for the best parse. But, due to the noisy nature of the unrestricted texts and huge size of the search space, in most of the times they could not identifies a good parse for the sentence. Against, partial parsers as a response to these difficulties aim to recover syntactic information efficiently from unrestricted text by sacrificing completeness and depth of analysis.
Of the all grammatical formalism, some lexicalized grammars such as HPSG, CCG and LTAG seems to be more potential to be used for partial parsing. The task of parser for these formalisms can be divided into two main steps: first, assigning the appropriate description for lexical items of the input sentence and then combining them to get a description for entire sentence.
The aim of this project is to develop an efficient and reliable partial parser for English, based on the XTAG grammar, a LTAG implementation for English. In fact, here we are going to give a solution for creating XTAG-based partial parser in presence of some lexical description of the input sentence that may produces by another non XTAG-based LTAG parser. The problem has been modeled as a sequence tagging problem that tags each source lexical description of the input sentence (supertag) by a target lexical description (supertag) of the XTAG grammar. The sequence tagging problem itself has been formulated as an HMM. Of course, the performance of the solution is strictly bounded by the quality of the former non XTAG-based parser. Here we have hired the MICA parser, a statistical dependency parser that has been trained to work on the LTAG automatically extracted from Penn Treebank using Chen’s approach, as non XTAG-based parser.
 |
|
|