Font Size: a A A

Research Of Interval Join Method Base On MapReduce

Posted on:2017-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:J YuFull Text:PDF
GTID:2348330503489882Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of network technology,the global data has increase rapidly,it makes great difficult to analyze and handle big data. As a new programming model for data-intensive computing, Map Reduce plays an important role in the field of analysis and processing big data. The interval join is a kind of join operation which the value of property belongs to a range, it's also an important operation of big data analysis and processing, how to take advantage of Map Reduce programming platform to enhance the efficiency of the interval join has important significance.Based on the Allen's proposed interval concepts and relationships, a new method bsaed on the collection classification is designed to solve two-way and multi-way interval join. Firstly, we evenly divide the tuples into several partitions according to the interval range, map the tuples into the appropriate partition collection, define four types of collection classification according the position of each tuple in the partition,and analyze the proportion of the total partition for the four types of collection classification. Next, use the Map Reduce distributed programming framework to solve two-way and multi-way interval join on the basis of the four types of collections classification. Construction of key-value pairs through the four types of collections classification can filter the tuple is not involved in the interval join, which reduces the amount of data transmission and calculation, promotions efficiency of interval join.Finally, formulate load balancing strategy for two-way and multi-way interval join respectively according to the proportion of the total partition of each collection classification, balance the data of each Reduce node by regrouping collection classification between each partitions, in order to further improve the efficiency of the completion of the interval join job.We using Hadoop to prove the effectiveness of the two-way and multi-way interval join. The results show that the proposed method can be adapted to a variety ofsituations and improve the efficiency of the interval join operation, and the load balancing strategies can further enhance efficiency.
Keywords/Search Tags:interval join, collection classification, two-way interval join, multi-way interval join, load balancing strategies
PDF Full Text Request
Related items