Asset Publisher Asset Publisher

Return to Full Page

zeinab meftah

zeinab meftah

Zeinab Meftah


email z.meftah [at] 

Machine Learning-based Grammatical Error Detection and Correction Method in Persian Language

Every day, a large amount of text data is produced. Written errors are inevitable in these texts because they are human produced. Automatic error-correction systems increase the quality of our writings by eliminating their errors. In this research, we focus on correcting grammatical errors. The purpose of the grammatical error correction systems is to examine sentences in terms of grammar rules and to provide grammar error feedback to learners. For example, the sentence "I saw her go to school." should be corrected as "I saw her going to school". There are different types of grammatical errors. In order to obtain statistics of the types and number of common grammatical mistakes, about 700 student essays were collected from learners of Persian. Then we manually annotated these essays with error tags and corrections. It is found that preposition and subject-verb agreement errors are the most frequent errors. This research aims to correct subject-verb agreement errors in the Persian text. Subject-verb agreement is a computation that is often difficult to execute perfectly in the first language and even more difficult to produce skillfully in a second language. Persian verbs agree with their subjects in grammatical person and number. We correct grammatical person and number of a verb independently. For the first part, a rule-based method is suggested whereas machine learning methods are used for the second part. To correct the grammatical number of a verb, we train a maximum entropy classifier and a decision tree on the error-free Persian text to choose a label of singular or plural for each verb. Language-specific features are suggested which are extracted ignoring verb of the sentence. Main features include the subject of the verb and its number and POS tag, and whether or not subject is animate. Our system achieves an F0.5 score of 47%. We compare our results with some baselines, statistical machine translation and neural machine translation based approaches. Our results show that the proposed method outperforms all of them. Keywords: grammatical error correction and detection, machine learning methods, correcting verb errors, correcting subject-verb agreement errors in Persian, decision tree, maximum entropy classifier