Home | University of Tehran | Faculty of Engineering | ECE Department | CIPCE | Contact Us
DataBase Research Group - University of Tehran


Last Updated on: 21/07/07 
Data Mining
Title: Mining for conceptual associations in unstructured and semi-structured text for reasoning with Human Plausible Reasoning.

Work by: Alireza Vazifeh Doust
Supervisor: Dr. Masoud Rahgozar
Advisor: Dr. Farhad Oroumchian
Team Members: None
Problem Definition, goal and Importance: [persian]

Using HPR is one of the common ideas in 'Question Answering' and 'Information Retrieval'. These systems do the conclusion by using the initially extracted relations from the documents in order to find the most related documents or answers. This conclusion is like the brain act when given the same initial relations. Experiences done using these systems have shown good results, but primary problem is that extraction of initial relations and knowledge existing in unstructured documents cannot easily be done automatically while the effectiveness of this method depends on amount and correctness of the initially gained information.

In this project we try to extract the relations between contexts of documents automatically, using data mining, NLP, and newer ideas like using topological relations. In this level just information is extracted. Using HPR for extracting knowledge is another goal for us. Then the collection of extracted information and knowledge will be the input for QA and IR systems based on HPR, and change in quality of retrieved knowledge in each of these systems will be analyzed.

This conceptual view to information which is beyond the document retrieval boundaries has a high importance in our world fully loaded with information.
It is very obvious that we need to give the machines capability of understating the context, to help human to find out what he really needs and what he doesn't need.

Approach:

This project has three primary goals:
 1. Introducing methods for extracting meaningful relations from the text and analyzing them.
 2. Using relations extracted from IR and QA systems and check the performance of our methods with and without them.
 3. Introducing methods for guessing the reliability of the extracted relations.

The first step in this project is to study different kinds of text mining methods. So studying NLP and other knowledge discovery methods is mandatory. And also studying HPR is needed to get familiar with ways of showing the extracted knowledge. The next step is to use the gained experiences to develop a prototype for extracting relations from the texts, and doing the experiment on IR and QA systems to check the change in quality of output. Also introducing methods to get sure of quality of relations is needed.

Research Prerequisites:

Extracting meaningful relations is mostly related to NLP and Text Mining. Data mining is of most important methods for knowledge discovery in structured data. But the problem is that most of the human knowledge is not maintained in databases, but in unstructured or at the most in semi-structured texts. The question is how we can apply the methods for structured data to unstructured data. This is where the text mining originates. Usually the methods are divided into to group of Statistical and Conceptual methods. The first group tries to expand the data mining concepts in ordinary texts. But the second group tries to use text understanding for knowledge discovery which has roots in NLP and Machine learning.

[1] M. A. Hearst. Untangling text data mining. In Proceedings of the ACL’99: the 37th Annual Meeting of the Association for Computational Linguistics. University of Maryland, June 20-26 1999

[2] Claire Grover, Harry Halpin, Ewan Klein, Jochen L. Leidner, Stephen Potter, Sebastian Riedel, Sally Scrutchin, and Richard Tobin. A framework for text mining services. In Proceedings of the Third UK e-Science Programme All Hands Meeting (AHM 2004), 2004.

[3] Sugato Basu, KDDEvaluating! the Novelty of TextMined RulesUsing Lexical Knowledge

[4]Witten, I. H., Don, K. J., Dewsnip, M. and Tablan, V. (2004) “Text mining in a digital library.” International Journal on Digital Libraries 4(1), 56-59

[5] H. Karanikas and B. Theodoulidis, ‘Knowledge discovery in text and text mining software’, Technical report, UMIST - CRIM, Manchester, 2002

[6] Kodratoff Y., “Knowledge Discovery in Texts: A Definition, and Applications,” in Foundation of Intelligent Systems, Ras & Skowron (Eds.) LNAI1609, Springer 1999

[7] M. Rajman. Text Mining, knowledge extraction from unstructured textual data. Proc. of EUROSTAT Conference, Francfort (Deutchland), may, 1997

[8] Un Yang Nahm,Text Mining with Information Extraction, 2001. PhD Proposal, The University of Texas at Austin

[9] Marie-Laure Reinberger, Unsupervised Text Mining for Ontology Learning,in proceeding of Machine Learning for the Semantic Web ,2005

[10] Ah-Hwee Tan. Text Mining: The state of the art and the challenges. In Proceedings, PAKDD'99 Workshop on Knowledge discovery from Advanced Databases (KDAD'99), Beijing, pp. 71-76, April 1999

[11] K. McCurley and A. Tomkins. Mining and knowledge discovery from the Web. In 7th International Symposium on Parallel Architectures, Algorithms and Networks, Hong Kong, 2004

[12] Oracle Text , a white paper from oracle.

[13] Sehgal, A.K. Text Mining: The Search for Novelty in Text. Ph.D. Comprehensive Examination Report, Dept. of Computer Science, The University of Iowa, April 2004 [14] Haralampos Karanikas, et.al. An Approach to Text Mining using Information Extraction

[15]M Rajman, M. and Besanon, R. 1997. Text Mining: Natural Language Techniques and Text Mining Applications. In Proceedings of the seventh IFIP 2.6 Working Conference on Database Semantics

[16] H. Zhuge, et al. An Automatic Semantic Relationships Discovery Approach. The 13th International World Wide Web Conference (WWW2004), New York, USA, May 2004,

[17] F. Oroumchian, R.N. Oddy, “An Application of Plausible Reasoning to Information Retrieval,” SIGIR 1996: 244-252.

[18] A. Collins, R. Michalski, “The logic of Plausible Reasoning A core theory,” Cognitive Science, vol. 13, pp. 1-49, 1989.

Reports and Seminars:

Implementation:

Related Links:

Publications:


Copyright 2007 DBRG-UT. All rights reserved. Designed by Aresh Dadlani, Mohamad Hasan Ahmadi