Home | University of Tehran | Faculty of Engineering | ECE Department | CIPCE | Contact Us
DataBase Research Group - University of Tehran


Last Updated on: 21/07/07 
Data Mining
Title: XML Data Mining

Work by: Rahman Alimohammadzadeh
Supervisor: Dr. Masoud Rahgozar
Advisor: Dr. Caro Lucas
Team Members: None
Problem Definition, goal and Importance: [persian]

Nowadays data mining techniques are used in many businesses. And there is a large need of using data mining in different businesses for achieving business intelligence as an important feature in competition. So researchers have an obligatory duty to develop and expand effective and high speed methods to support decision making in large databases.

Apriori is one of the most used and well known algorithms in extracting association rules. Apriori is not suitable for using in large databases; First, because of many reads from database and second, because of making many useless candidate items.

Beside, growing use of internet and specially its commercial usages caused that xml files be used more than before specially in transporting information. So xml data mining has been known as new field of research and many researchers are developing new methods for data mining in xml files. Dynamic nature, heterogeneous structure, and large size are just a few problems of data mining in xml files. Our goal is to try to improve data mining algorithms based on Apriori, and modifying them in order to use them in xml data mining.

Approach:

  1. Studying association rule extraction algorithms and approaches (AR)
  2. Implementing chosen algorithms.
  3. Analyzing the previous algorithms and modify them via removing ineffective procedures, extra works, and bottlenecks.
  4. Developing a theorical framework for using AR methods in xml data mining.

Research Prerequisites:

[1] H Tan, TS Dillon, L Feng, E Chang.Tree Model Guided Candidate Generation Approach for XML Data Mining 2005.

[2] Tan, H., et al. X3-Miner: mining patterns from XML Database. in Data Mining. Skiathos, Greece. 2005.

[3] Nayak, R. “Discovering Knowledge from XML Documents “Queensland University of Technology, Australia, 2005.

[4] Sheng Zhang, Ji Zhang, Han Liu, Wei Wang: XAR-miner: efficient association rules mining for XML data. WWW (Special interest tracks and posters) 2005: 894-895

[5] Feng L. & T. Dillon.Mining XML-Enabled Association Rule with Templates. In Proceedings of KDID 2004.

[6] J. Zhang, T. W. Ling, R. M. Bruckner, A. M. Tjoa, H. Liu. On Efficient and Effective Association Rule Mining from XML Data. In Proceedings of DEXA 2004, LNCS 3180, pp. 497 - 507, Zaragosa, Spain, 2004.

[7] L. Feng, T. S. Dillon, H. Weigand, E. Chang. An XML-Enabled Association Rule Framework. In Proceedings of DEXA 2003, pp 88-97, Prague, Czech Republic, 2003.

[8] Zaki, M. J. and K. Gouda (2003). Fast Vertical Mining using Diffsets. SIGKDD 2003, Washington DC, USA.

[9] Braga, D., A. Campi, et al.. "A Tool for Extracting XML Association Rules." 14th IEEE International Conf. on Tools with Artificial Intelligence (ICTAI'2002).

[10] A. Termier, M.-C. Rousset, and M. Sebag. Treefinder: a first step towards XML data mining. In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), pages 450–457, 2002.

[11] Zaki, M. J. Efficient Mining of Trees in the Forest. SIGKDD 2002, Edmonton, Alberta, Canada, ACM.

[12] Büchner, A.G., Baumgarten, M., Mulvenna, M.D., Böhm, R., and Anand, S.S., Data Mining and XML: Current and Future Issues, Proceedings of the First International Conference on Web Information Systems Engineering (WISE'00),pp127-131, Hong Kong, 2000

[13] Zaki, M. J. and M. Ogihara (1998). Theoritical Foundations of Association Rules.DMKD, Seattle, WA, USA.

Related Conferences:

VLDB

IEEE International Conference on Data Mining (ICDM)

Knowledge Discovery and Data Mining (KDD)

Related Links:

1. Education in Data Mining and Knowledge Discovery

2. XML Mining

Publications:

Incremental Mining of Frequent XML Query Patterns

Efficiently Mining Frequent Trees in a Forest

Fast mining of frequent tree structures by hashing and indexing

Frequent Free Tree Discovery in Graph Data

Mining XML-Enabled Association Rules with templates

X3-Miner: mining patterns from XML database

Template guided association rule mining from XML documents

W3-Miner: Mining Weighted Frequent Subtree Patterns in a Collection of Trees

Complete Discovery of Weighted Frequent Subtrees in Tree-Structured Datasets


Copyright ɠ2007 DBRG-UT. All rights reserved. Designed by Aresh Dadlani, Mohamad Hasan Ahmadi