Font Size: a A A

Research And Implementation Of XML Query Processing Optimization Based On Dynamic Data Distribution

Posted on:2018-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:Z L LiFull Text:PDF
GTID:2348330536478607Subject:Engineering
Abstract/Summary:PDF Full Text Request
As a structured language,XML defines the standard of Web data representation and exchange,more and more widely used.Today Data showed explosive growth and processing form of comlpex,diverse,massive XML files is becoming increasingly important.How to efficiently deal with massive XML data in the moment has important significance and wide application prospects.At present,the typical method of dealing with large-scale data sets is distributed computing,and Hadoop is a typical support framework for cloud computing and big data processing.Using Hadoop's MapReduce calculation model to carry out XML structured query processing is an important research issue.Due to the structural characteristics of XML,the technology of relational database can not be effectively applied to XML query processing.This paper implements a native XML database system,including massive XML data storage model and query processing methods,and on this basis,the query optimization processing technology to study.In this paper,we first present a MapReduce based XML structural join processing algorithm,then optimize and improve the MapReduce data placement and partitioning strategy from the point of view of data distribution.The query processing process consists of two phases: the Map phase and the Reduce stage.In the Map phase of the structural join query processsing,according to the use frequency of the XPath statement,the closely related files are put together by clustering.In the Reduce phase,by our curve fitting method which can automatically estimate the parameter,the data is relatively evenly distributed to each computing node,to maximum the efficiency.After that,the research of workload balance is carried out to improve the efficiency of query further.Finally,the optimization algorithm is implemented on DXQ S.The experimental results show that the optimized algorithm based on dynamic al ocation of data can effectively improve the overall query efficiency.
Keywords/Search Tags:XML, Query, Optimization, Distrubution
PDF Full Text Request
Related items