|
|
|
Useful resources for Spell, Grammar and semantic error checking for Persian language
Spell checker resources
Preprocessing rules Persian Preprocessing rules
Persian probabilities for computing Damerau–Levenshtein distance:
Addition probability Persian letters addition probability
Deletion probability Persian letters deletion probability
Substitution probability Persian letters substitution probability
Transposition probability Persian letters transposition probability
Grammar checker resources
Grammar rules Persian grammar rules in xml format
Plural nouns Imported Arabic plural nouns
Most frequent POS tag Most frequent POS tag of some Persian words (trained from Bijankhan corpus)
Semantic checker resources
Following file contains language model created from collection of news from Islamic Republic of Iran News Agency (IRNA)
language model language model for Persian
Mutual information The complete database including mutual information
Confusion Set Confusion set is a set of words which are considered confusable with the headword of the set due to typing mistakes or phonetic similarities, but are not necessarily confusable with each other
Test Set
In order to evaluate the accuracy of different parts of the system, a rather large artificial assessment set is prepared. This test set contains collection of Persian news which we had manually injected about 400 spelling errors, 70 grammar errors and 60 real word errors randomly in the text.
[Test set]
| |