• Hamshahri Collection «
  • Download «
  • Publications «
  • Project Members «
  • Contact us «

Links
  • Bijankhan Corpus «
  • dotIR Collection «
  • Univ. of Tehran «
  • DBRG «
  • TREC «
  • CAASL «
  • CLEF «

 

Hamshahri Collection

Publications

Please cite paper number [1] if you want to use the collection or the tools in this web site.

[1] Abolfazl AleAhmad , Hadi Amiri , Ehsan Darrudi , Masoud Rahgozar , Farhad Oroumchian, Hamshahri: A standard Persian text collection, Journal of Knowledge-Based Systems, Vol. 22 No.5, p.382-387, Elsevier, July 2009.
Download
Description
This paper describes the Hamshari collection and its characteristics.

 

[2] Ehsan Darrudi, Mohammad Reza Hejazi, Farhad Oroumchian, Assessment of a Modern Farsi Corpus, In Proceedings of the 2nd Workshop on Information Technology & its Disciplines (WITID'04), ITRC, Kish Island, Iran, 2004.
Download
Description
This paper describes how we have constructed a well-structured 345 MB tagged corpus of news, and presents some beneficial statistics of this corpus based upon the characteristics of Farsi language. (fitness of the frequency, Zipf-Mandelbrot’s law, etc.)

 

[3] Hadi Amiri, Abolfazl AleAhmad, Farhad Oroumchian, Caro Lucas, Masoud Rahgozar, Using OWA Fuzzy Operator to Merge Retrieval System Results, The Second Workshop on Computational Approaches to Arabic Script-based Languages, LSA 2007 Linguistic Institute, Stanford University, USA, 2007.
Download
Description

In this study, we investigated performance of Persian retrieval by merging four different language modeling methods and two vector space models with Lnu.ltu and Lnc.btc weighting schemes by use of a quantifier based OWA operator. To download C# implementation Vector Space model used in this paper click here and for Language Modeling click here.

 

[4] Abolfazl Aleahmad, Parsia Hakimian, Farzad Mahdikhani, Farhad Oroumchian, N-Gram and Local Context Analysis For Persian Text Retrieval, International Symposium on Signal Processing and Its Applications, Sharjah U.A.E., 2007.
Download
Description
In this experimental study, we assessed term and N-gram based vector space model and a query expansion method, namely, Local Context Analysis using different weighting schemes on Hamshahri corpus.

 

[5] Alireza Mokhtaripour, Saber Jahanpour, Introduction to a new Farsi stemmer, Proceedings of the 15th ACM international conference on Information and Knowledge Management, p. 826 - 827, ISBN:1-59593-433-2, 2006.
Download
Description
Usage of the Hamshahri collection to create a Persian stemmer

 

[6] Farhad Oroumchian, Ehsan Darrudi, Fattane Taghiyareh, Neeyaz Angoshtari, Experiments with Persian text compression for web, 13th International World Wide Web conference, New York, NY, USA, 2004.
Download
Description
The approach presented in this paper aims to reduce the storage and the transmission time for Persian text files in web-based applications and Internet. Moreover, a genetic algorithm is utilized to select the most appropriate n-grams. In the best case, we have achieved 52.26 % reduction of the file size.

 

[7] Abolfazl AleAhmad, Ehsan Kamalloo, Arash Zareh, Masoud Rahgozar, Farhad Oroumchian. Cross Language Experiments at Persian@CLEF 2008, Cross Language Evaluation Forum (CLEF 2008), Aarhus, Denmark, 2008.
Download
Description
اUsing the Hamshahri collection to evaluabte different cross language algorithms at CLEF2008.

 

[8] Amir Nayyeri, Farhad Oroumchian, FuFaIR: a Fuzzy Persian Information Retrieval System, IEEE International Conference on Computer Systems and Applications, p. 1126-1130, 2006.
Download
Description
This paper discusses the design, implementation and testing of a Fuzzy retrieval system for Persian called FuFaIR. This system also supports Fuzzy quantifiers in its query language. Tests have been conducted using the Hamshahri collection.

 

[9] بهاره بینا ،مسعود رهگذر، آذین ده موبد، طبقه بندی خودکار متون فارسی، سیزدهمین کنفرانس ملی انجمن کامپیوتر ایران، جزیره کیش، خلیج فارس، ایران، اسفند 1386.
Download
Description
The authors used the collection for Persian classification and reported their results.

 

[10] Morteza Mohaqeqi, Reza Soltanpoor, Azadeh Shakery, Improving the Classification of Unknown Documents by Concept Graph, CSICC2009, Tehran, Iran, 2009.
Download
Description
The authors used the collection for Persian classification based on concept graphs and reported their results.

 

 

© Copyright 2009 University of Tehran, Database Research Group. All Rights Reserved.
Designed by Farzad Mahdikhani - Last update: 2010 Aug. 17