Home | University of Tehran | Faculty of Engineering | ECE Department | CIPCE | Contact Us
DataBase Research Group - University of Tehran


Last Updated on: 21/07/07 
Data Management
Title: An Efficient Framework for XML Data Management

Work by: Mehdi Emadi
Supervisor: Dr. Masoud Rahgozar
Advisor: Dr. Marjan Sirjani
Team Members: 1. Adel Ardalan
2. Alireza Kazerani
3. Mohammad Sina Ariyan
4. Reza Shahidi
Problem Definition, goal and Importance: [persian]

XML (Extendible Markup Language) is the standard language for transporting data between programs in the internet. Also XML is used as 'Document Markup language', a framework for specifying the objects (Object Serialization language), and a framework for transporting messages on the network (massage passing). But the main usage is still in the internet.

Many systems use this standard for transporting information. Most of systems that do the transportation of information with this standard use relational databases as the primary core of storing and management of their data, so a big amount of these programs' codes are used for transforming the information to XML and reverse, which is a repeated work and is not usually as effective as needed.
W3C organization has introduced a standard language named 'XQuery'. Implementing softwares that understand this language is needed, so a complete system for storing and retrieving XML is needed. We have two ways to do this:

1- Implementing XML specific databases (like eXist, Timber, and etc.)
2- Using common relational databases (like oracle, DB2.)

Beside the relational databases are designed tuned for structured data and this usually cannot be applied to xml format. XML must be defined as DTD (Data Type Definition) and this definition must be transformed to a relational definition which is a set of tables and relations between them. Also XML queries (XQuery) must be transformed into relational databases queries. Many sample systems have been developed in recent years like (SilkRoute2003, LegoDB 2002, Rox2004, and Almaden.) at present introducing a complete and effective system that meets the user's needs is a hot topic in this area. There are some idea's introduced in papers but they are all for exception cases. Our goal is to analyze and compare existing methods and at last introduce a way for improving them and test if it is possible.

Approach:

Studying relevant papers provided by well-known international publishers like VLDB, Elsevier, Springer, ACM, and IEEE is needed as the first step. Then we begin to analyze existing commercial systems which are accessible and compare the solutions they have presented. Next step is to introduce new solutions and a prototype for testing our idea. After implementing the prototype we can compare our solution with others better.

Research Prerequisites:


[1] Franois Yergeau, Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler. Extensible Markup Language (XML) 1.0 (Third Edition) W3C Recommendation 4 February 2004

[2] Mary Fernndez, Ashok Malhotra, Jonathan Marsh, Marton Nagy, Norman Walsh. XQuery 1.0 and XPath 2.0 Data Model, W3C Working Draft, last release 23 July 2004

[3] XQuery: an XML Query Language; D. Chamberlin, IBM Systems Journal 41(4), 2002. Available at http://www.research.ibm.com/journal/sj/414/chamberlin.pdf

[4] Jennifer Widom. Data Management for XML: Research Directions. IEEE Data Engineering Bulletin 22 (3): 44-52 (1999)

[5] Relational Databases for Querying XML Documents: Limitations and Opportunities. Jayavel Shanmugasundaram, H. Gang, Kristin Tufte, Chun Zhang, David DeWitt, Jeffrey F. Naughton. VLDB 1999

[6] P. Bohannon, J. Freire, P. Roy, J. Simeon. From XML schema to relations: a cost-based approach to XML storage, Proc. ICDE 2002 Page(s): 64 -75 IEEE.

[7] M. Fernndez, Y. Kadiyska, D. Suciu, A. Morishima, W. C. Tan. Silkroute: A framework for publishing relational data in xml. ACM Trans. Database Syst., 27(4):438--493, 2002

Reports and Seminars:

seminar repory (persian)

presentation 1(persian)

presentation 2(english)

presentation 3(persian)

Implementation:
  1. without DTD
    1. EDGE
    2. XREL
    3. XPARENT
    4. ORDPATH
  2. with DTD
    1. SHARED
    2. SHARED+
Related Links:


UW XML Data Repository

W3C XML

XML Database Reading List (Weining Zhang)

XML Database Product (Ronald Bourret)

XML / Database Links (Ronald Bourret)

Timber (Native XML Database)

Publications:


[1] M. Emadi, M. Rahgozar, A. Ardalan, A. Kazerani, M.M. Arian. A Comparative Study of DTD-Independent XML Data Storage Approaches. 11th International CSI Computer Conference (CSICC'06), Tehran, Iran, January 2006

[2] M. Emadi, M. Rahgozar, A. Ardalan, A. Kazerani, M.M. Arian. Storage Approaches for DTD-Independent XML Data. 14th Iranian conference on Electrical Engineering (ICEE) - IEEE, Tehran, Iran, May 2006

[3] M. Emadi, M. Rahgozar, A. Ardalan, A. Kazerani, M.M. Arian. Approaches and Schemes for Storing DTDIndependent XML Data in Relational Databases. TRANSACTIONS ON ENGINEERING, COMPUTING AND TECHNOLOGY, Vol. 13, pp. 168-173, May 2006, ISSN 1305-5313

Related Studying Resources:


[1] M. Fernndez, Y. Kadiyska, D. Suciu, A. Morishima, W. C. Tan. Silkroute: A framework for publishing relational data in xml. ACM Trans. Database Syst., 27(4):438--493, 2002

[2] Extendible Markup Language (XML), W3C, http://www.w3.org/XML/

[3] P. ONeil, E. ONeil, S. Pal, I. Cseri, G. Schaller. ORDPATHs: Insert-Friendly XML Node Labels. SIGMOD 2004.

[4] Timo Bhme, Erhard Rahm: Supporting Efficient Streaming and Insertion of XML Data in RDBMS. DIWeb 2004: 70-81

[5] Hongjun Lu, Jeffrey Xu Yu, Guoren Wang, Shihui Zheng, Haifeng Jiang, Ge Yu, Aoying Zhou: What makes the differences: benchmarking XML database implementations. ACM Trans. Internet Techn. 5(1): 154-194 (2005)

[6] Daniela Florescu, Donald Kossmann: Storing and Querying XML Data using an RDMBS. IEEE Data Eng. Bull. 22(3): 27-34 (1999).

[7] Yoshikawa, M., Amagasa, T., Shimura, T. & Uemura, S. (2001), `XRel: a path-based approach to storage and retrieval of XML documents using relational databases', TOIT 1(1), 110--141.

[8] Haifeng Jiang, Hongjun Lu, Wei Wang, Jeffrey Xu Yu: Path Materialization Revisited: An Efficient Storage Model for XML Data. Australasian Database Conference 2002

[9] Haifeng Jiang, Hongjun Lu, Wei Wang, Jeffrey Xu Yu: XParent: An Efficient RDBMS-Based XML Database System. ICDE 2002: 335-336

[10] E. ONeil, P. ONeil, S. Pal, I. Cseri, G. Schaller, N. Westbury. ORDPATHs: Insert-Friendly XML Node Labels. ACM SIGMOD Industrial Track, 2004

[11] E. Cohen, H. Kaplan, T. Milo. Labeling Dynamic XML Trees. In Proc. of PODS 2002

[12] Daniela Florescu, Donald Kossmann. Storing and Querying XML Data Using an RDBMS, IEEE Data Engineering Bulletin, Sep 1999.

[13] Timo Bhme, Erhard Rahm: Supporting Efficient Streaming and Insertion of XML Data in RDBMS. DIWeb 2004: 70-81

[14] ABITEBOUL, S. QUASS, D.,MCHUGH, J.,WIDOM, J., ANDWIENER, J. L. 1997. The Lorel query language for semistructured data. International Journal on Digital Libraries 1, 1, 6888.

[15] DEUTSCH, A., FERNANDEZ, M., AND FLORESCU, D. 1999. A query language for XML. In Proceedings of the 8th International World Wide Web Conference. Toronto, Canada. 11551169.

[16] CHAMBERLIN, D. D., ROBIE, J., AND FLORESCU, D. 2000. Quilt: An XML query language for heterogeneous data sources. In WebDB (Informal Proceedings). 5362.

[17] Clark, J., DeRose, S.XML Path Language (XPath) Version 1.0. W3C Recommendation, Nov. 1999. Available at http://www.w3.org/TR/xpath.

[18] Donald D. Chamberlin: XQuery: An XML query language. IBM Systems Journal 41(4): 597-615 (2002)

[19] BONIFATI, A. AND CERI, S. 2000. Comparative analysis of five XML query languages. SIGMOD Record 29, 1, 6879.

[20] LEE, D. AND CHU, W. W. 2000. Comparative analysis of six XML schema languages. SIGMOD Record 29, 3, 7687.

[21] SCHMIDT, A., KERSTEN, M. L., WINDHOUWER, M., ANDWAAS, F. 2000. Efficient relational storage and retrieval of XML documents. In International Workshop on the Web and Databases (Informal Proceedings). 4752.

[22] Bhme, T.; Rahm, E.: Multi-User Evaluation of XML Data Management Systems with XMach-1. Lecture Notes in Computer Science (LNCS) 2590, pp. 148-159, Springer, 2003

[23] A. Schmidt, F. Waas, M. Kersten, M. Carey, I. Manolescu, and R. Busse. Xmark: A benchmark for XML data management. In Proceedings of the 28th International Conference on Very Large Databases (VLDB), pages 974--985, 2002.


Copyright 2007 DBRG-UT. All rights reserved. Designed by Aresh Dadlani, Mohamad Hasan Ahmadi