Font Size: a A A

Research On The Concurrent Processing Of Massive Data Set

Posted on:2016-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:J Y WangFull Text:PDF
GTID:2298330467492564Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the big data today, The Internet is moving towards the further development of the Mobile Internet, Social networks and other new content are constantly emerging, people can easily get to the information they want. However, with the continuous development of demand and business, the data generated is also exponential growth. Massive data set is of immeasurable value, the relationship between data will play an important role in corporate operation,decision making. How can we accurately, rapid and stable acquisition target data from large-scale data we need is becoming an urgent problem to be solved.This topic mainly aims at a Internet Co’s game department business needs, its business complex, large-scale users, a wide range of products,and user generated Internet concurrent requests very high. With the business development and growth in the amount of data, in the context of the massive amounts of data and high concurrent data requests, traditional relational database system bottleneck highlights, performance degradation,response delay increases, the impact on the business significantly,unable to meet the efficient storage of such a large data and quick response of highly concurrent read and write requests.To solve this problem, through technical means to use HBase store and manage the data,and establish secondary index,and design MapReduce method to parallel processing user base data, to provide an efficient data parallel processing platform for thedepartment.The main research work in this paper is summarized as follows:1. This paper analyzes the MapReduce parallel processing framework and distributed column database HBase. Combining the two together, to find suitable technology to meet system business needs for datastorage management and parallel processing.2. Design data from relational databases and text to HBase data migration scenarios. The program combines Sqoop and MapReduce, import the data into HBase, under a number of conditions to consider business features, system performance, greatly accelerated the speed of data preprocessing.3. Design HBase secondary index scheme. The program uses MapReduce combined with HBase Coprocessor to establish secondary index for the data, the whole process is transparent to the application in HBase server, making the system supports complex queries based on the column, to meet business needs.4. Provides external interface for reading and writing system data, the interface is based on HBase Java API, and according to the characteristics of business needs, providing data write, rowkey general inquiries,range queries, batch queries and index-based query.
Keywords/Search Tags:Massive data set, MapReduce, HBase, Concurrent, Data Migration, Secondary Index
PDF Full Text Request
Related items