Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
21078.pdf (5.9 MB)
ETD Abstract Container
Abstract Header
A MapReduce Performance Study of XML Shredding
Author Info
Lam, Wilma Samhita Samuel
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=ucin1467126954
Abstract Details
Year and Degree
2016, MS, University of Cincinnati, Engineering and Applied Science: Computer Science.
Abstract
XML is an extensible markup language that came into popularity for its ease of use and readability. It has emerged as one of the leading media used for data storage and transfer over the World Wide Web as it is platform independent, readable, and can be used to share data between programs. There are tools available for extraction of data directly from XML documents, but many organizations use relational databases as repositories to store, manipulate, and analyze XML data. The data can be extracted into a database to reduce the redundancy present in XML documents by eliminating the repetition of tags while preserving the values. Several algorithms have been devised to provide efficient shredding (mapping of XML data to relational tables) of XML documents. The shredding of an XML document is performed through a set of sequential steps that traverse the tree structure from root node to leaf nodes. Sequential processing of large XML documents is time consuming, therefore we devise a method to implement parallelization by splitting a large XML document into a set of smaller XML documents. We extend a shredding algorithm to process the XML documents in parallel. We conduct experiments with parallel and sequential implementations on a single machine and a parallel MapReduce implementation in the cloud. We compare the performance of the three implementations for several real-world datasets and different parameters such as partition sizes. Our experiments indicate that the performance of the algorithms can be predicted through parameters such as the number of elements at depth 1 of an XML dataset. These parameters help identify a suitable implementation for shredding. Our experiments also indicate that MapReduce is a scalable environment that performs better for larger partition sizes.
Committee
Karen Davis, Ph.D. (Committee Chair)
Raj Bhatnagar, Ph.D. (Committee Member)
Carla Purdy, Ph.D. (Committee Member)
Pages
92 p.
Subject Headings
Computer Science
Keywords
MapReduce
;
XML
;
Relational
;
XML Splitting
;
Performance Comparison
;
Shredding
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Lam, W. S. S. (2016).
A MapReduce Performance Study of XML Shredding
[Master's thesis, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1467126954
APA Style (7th edition)
Lam, Wilma Samhita Samuel.
A MapReduce Performance Study of XML Shredding.
2016. University of Cincinnati, Master's thesis.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1467126954.
MLA Style (8th edition)
Lam, Wilma Samhita Samuel. "A MapReduce Performance Study of XML Shredding." Master's thesis, University of Cincinnati, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1467126954
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
ucin1467126954
Download Count:
568
Copyright Info
© 2016, all rights reserved.
This open access ETD is published by University of Cincinnati and OhioLINK.