Font Size: a A A

Research On Range Query Of Big Data Based On P-Ring

Posted on:2015-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:T T ZhangFull Text:PDF
GTID:2298330431964351Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, with the development of Internet applications such as large-scalee-commerce, social networking and digital cities Internet data is rapidly expanding.Currently, companies require effective data analysis when making important decisions.For example we can obtain spending habits through analyzing user data. How toderive effective information from large-scale data has become one of maincompetitiveness to enterprises.In the present, range query is mainly made by establishing index, such as B+treeor DST. However when the number of nodes in the system and the amount of data in asingle node increases significantly, the above traditional index will become extremelylarge so it will affect the query efficiency. In addition, there exists an index suitablefor range query which is P-Ring. It is an ordered ring based on attribute value. Everynode stores a range which is from the value of current node to the value of itssuccessor node. P-Ring uses hierarchical routing. Its query results are specific valuesinstead of index information of files.Therefore, this paper improved P-Ring mainly based on large data environmentand introduced MapReduce and B+tree. And we built a dual-index. Queryenvironment of this article is to store data in file units. The process of range query isdecided into two steps which are locating files and querying data. In this paper,P-Ring is improved by mainly following aspects:1) Adding an infoTable in every node to save index file information. The rangestored in a node is split into several sub ranges according to a value to store. TheinfoTable stores file name and file address of all files of this node, some rangesegment and corresponding file of these segments. The first row stores sub ranges andthe second row stores file index information accordance with every sub range. Eachtime that adds a new value to P-Ring the scope of existing storage is split intosub-limits according to the current value. This makes it easy to locate specific filesand improve query efficiency.2) Improving routing table by establishing bi-directional routing. It includessuccessor nodes and predecessor nodes. Every node stores a hierarchical andbidirectional routing table. This routing method theoretically reduces the number of hops when query is executed.Range query algorithm of this article nodes are organized by MapReduce. Thelogical relationship between the nodes is P-Ring which is stored in the Master node.Then establish a B+tree for every file. When each query arrives, sending the query toMaster, the Master node searches qualified files based on P-Ring. Then assign indexinformation of these files to Map nodes and they query data based on B+tree. Thisprocess is highly parallel.At last this range query algorithm is verified by experiment. The platform ofexperiment includes6nodes. I select3indicators which are max-hop, average-hopand responding time to measure performance of this algorithm. The results show thatthis algorithm is feasible, and the performance is not affected by the scope of querylength.
Keywords/Search Tags:Range Query, Big Data, P-Ring, B+Tree, MapReduce
PDF Full Text Request
Related items