Skip to Main Content
 

Global Search Box

 
 
 
 

Files

File List

ETD Abstract Container

Abstract Header

A MapReduce Performance Study of XML Shredding

Lam, Wilma Samhita Samuel

Abstract Details

2016, MS, University of Cincinnati, Engineering and Applied Science: Computer Science.
XML is an extensible markup language that came into popularity for its ease of use and readability. It has emerged as one of the leading media used for data storage and transfer over the World Wide Web as it is platform independent, readable, and can be used to share data between programs. There are tools available for extraction of data directly from XML documents, but many organizations use relational databases as repositories to store, manipulate, and analyze XML data. The data can be extracted into a database to reduce the redundancy present in XML documents by eliminating the repetition of tags while preserving the values. Several algorithms have been devised to provide efficient shredding (mapping of XML data to relational tables) of XML documents. The shredding of an XML document is performed through a set of sequential steps that traverse the tree structure from root node to leaf nodes. Sequential processing of large XML documents is time consuming, therefore we devise a method to implement parallelization by splitting a large XML document into a set of smaller XML documents. We extend a shredding algorithm to process the XML documents in parallel. We conduct experiments with parallel and sequential implementations on a single machine and a parallel MapReduce implementation in the cloud. We compare the performance of the three implementations for several real-world datasets and different parameters such as partition sizes. Our experiments indicate that the performance of the algorithms can be predicted through parameters such as the number of elements at depth 1 of an XML dataset. These parameters help identify a suitable implementation for shredding. Our experiments also indicate that MapReduce is a scalable environment that performs better for larger partition sizes.
Karen Davis, Ph.D. (Committee Chair)
Raj Bhatnagar, Ph.D. (Committee Member)
Carla Purdy, Ph.D. (Committee Member)
92 p.

Recommended Citations

Citations

  • Lam, W. S. S. (2016). A MapReduce Performance Study of XML Shredding [Master's thesis, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1467126954

    APA Style (7th edition)

  • Lam, Wilma Samhita Samuel. A MapReduce Performance Study of XML Shredding. 2016. University of Cincinnati, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1467126954.

    MLA Style (8th edition)

  • Lam, Wilma Samhita Samuel. "A MapReduce Performance Study of XML Shredding." Master's thesis, University of Cincinnati, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1467126954

    Chicago Manual of Style (17th edition)