Natural Language and Text Processing Laboratory
University of Tehran

UTSpellChecker: spell, grammar and semantic error checker for Persain language

  • Useful resources for Spell, Grammar and semantic error checking for Persian language


  • Spell checker resources


  • Preprocessing rules Persian Preprocessing rules

  • Persian probabilities for computing Damerau–Levenshtein distance:

  • Addition probability Persian letters addition probability

  • Deletion probability Persian letters deletion probability

  • Substitution probability Persian letters substitution probability

  • Transposition probability Persian letters transposition probability


  • Grammar checker resources


  • Grammar rules Persian grammar rules in xml format

  • Plural nouns Imported Arabic plural nouns

  • Most frequent POS tag Most frequent POS tag of some Persian words (trained from Bijankhan corpus)


  • Semantic checker resources


  • Following file contains language model created from collection of news from Islamic Republic of Iran News Agency (IRNA)

  • language model language model for Persian


  • Mutual information The complete database including mutual information


  • Confusion Set Confusion set is a set of words which are considered confusable with the headword of the set due to typing mistakes or phonetic similarities, but are not necessarily confusable with each other


  • Test Set


  • In order to evaluate the accuracy of different parts of the system, a rather large artificial assessment set is prepared. This test set contains collection of Persian news which we had manually injected about 400 spelling errors, 70 grammar errors and 60 real word errors randomly in the text. [Test set]