
Links
|
Hamshahri Collection
Publications
Please cite paper number [1] if you want to use the collection or the tools in this web site.
| [1] |
Abolfazl AleAhmad , Hadi Amiri , Ehsan
Darrudi , Masoud Rahgozar , Farhad Oroumchian, Hamshahri: A standard Persian text collection, Journal of Knowledge-Based
Systems, Vol. 22 No.5, p.382-387, Elsevier, July 2009. |
|
 |
|
|
 |
This paper describes the Hamshari collection and its characteristics. |
| [2] |
Ehsan Darrudi,
Mohammad Reza Hejazi, Farhad Oroumchian, Assessment of a Modern Farsi Corpus, In Proceedings of
the 2nd Workshop on Information Technology & its Disciplines
(WITID'04), ITRC, Kish Island, Iran, 2004. |
|
 |
|
|
 |
This paper describes how we have constructed a well-structured 345 MB tagged corpus of news, and presents some beneficial statistics of this corpus based upon the characteristics of Farsi language. (fitness of the frequency, Zipf-Mandelbrot’s law, etc.) |
| [3] |
Hadi Amiri, Abolfazl AleAhmad, Farhad
Oroumchian, Caro Lucas, Masoud Rahgozar, Using OWA Fuzzy
Operator to Merge Retrieval System Results, The Second
Workshop on Computational Approaches to Arabic Script-based
Languages, LSA 2007 Linguistic Institute, Stanford
University, USA, 2007. |
|
 |
|
|
 |
In this study, we investigated performance of Persian retrieval by merging four different language modeling methods and two vector space models with Lnu.ltu and Lnc.btc weighting schemes by use of a quantifier based OWA operator. To download C# implementation Vector Space model used in this paper click here and for Language Modeling click here.
|
| [4] |
Abolfazl Aleahmad, Parsia Hakimian, Farzad
Mahdikhani, Farhad Oroumchian, N-Gram and Local Context Analysis For Persian Text
Retrieval, International Symposium on Signal Processing
and Its Applications, Sharjah U.A.E., 2007. |
|
 |
|
|
 |
In this experimental study, we assessed term and N-gram based vector space model and a query expansion method, namely, Local Context Analysis using different weighting schemes on Hamshahri corpus. |
| [5] |
Alireza Mokhtaripour, Saber Jahanpour, Introduction to a new Farsi stemmer, Proceedings of the 15th ACM international conference on Information and Knowledge Management,
p. 826 - 827, ISBN:1-59593-433-2, 2006. |
|
 |
|
|
 |
Usage of the Hamshahri collection to create a Persian stemmer |
| [6] |
Farhad Oroumchian, Ehsan Darrudi, Fattane
Taghiyareh, Neeyaz Angoshtari, Experiments with Persian text compression for web,
13th International World Wide Web conference, New York, NY,
USA, 2004. |
|
 |
|
|
 |
The approach presented in this paper aims to reduce the storage and the transmission time for Persian text files in web-based applications and Internet. Moreover, a genetic algorithm is utilized to select the most appropriate n-grams. In the best case, we have achieved 52.26 % reduction of the file size. |
| [7] |
Abolfazl AleAhmad, Ehsan Kamalloo, Arash Zareh, Masoud Rahgozar, Farhad Oroumchian. Cross Language Experiments at Persian@CLEF 2008, Cross Language Evaluation Forum (CLEF 2008), Aarhus, Denmark, 2008. |
|
 |
|
|
 |
اUsing the Hamshahri collection to evaluabte different cross language algorithms at CLEF2008. |
| [8] |
Amir Nayyeri,
Farhad Oroumchian, FuFaIR: a
Fuzzy Persian Information Retrieval System, IEEE
International Conference on Computer Systems and
Applications, p. 1126-1130, 2006. |
|
 |
|
|
 |
This paper discusses the design, implementation and testing of a Fuzzy retrieval system for Persian called FuFaIR. This system also supports Fuzzy quantifiers in its query language. Tests have been conducted using the Hamshahri collection. |
| [9] |
بهاره بینا ،مسعود رهگذر، آذین ده موبد، طبقه بندی خودکار متون فارسی، سیزدهمین کنفرانس ملی انجمن
کامپیوتر ایران، جزیره کیش، خلیج فارس، ایران، اسفند 1386. |
|
 |
|
|
 |
The authors used the collection for Persian classification and reported their results. |
| [10] |
Morteza Mohaqeqi,
Reza Soltanpoor, Azadeh Shakery, Improving the Classification of Unknown Documents by Concept
Graph, CSICC2009, Tehran, Iran, 2009. |
|
 |
|
|
 |
The authors used the collection for Persian classification based on concept graphs and reported their results. |
|