Research On The Concurrent Processing Of Massive Data Set

Posted on:2016-09-30

Degree:Master

Type:Thesis

Country:China

Candidate:J Y Wang

Full Text:PDF

GTID:2298330467492564

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In the big data today, The Internet is moving towards the further development of the Mobile Internet, Social networks and other new content are constantly emerging, people can easily get to the information they want. However, with the continuous development of demand and business, the data generated is also exponential growth. Massive data set is of immeasurable value, the relationship between data will play an important role in corporate operation,decision making. How can we accurately, rapid and stable acquisition target data from large-scale data we need is becoming an urgent problem to be solved.This topic mainly aims at a Internet Co’s game department business needs, its business complex, large-scale users, a wide range of products,and user generated Internet concurrent requests very high. With the business development and growth in the amount of data, in the context of the massive amounts of data and high concurrent data requests, traditional relational database system bottleneck highlights, performance degradation,response delay increases, the impact on the business significantly,unable to meet the efficient storage of such a large data and quick response of highly concurrent read and write requests.To solve this problem, through technical means to use HBase store and manage the data,and establish secondary index,and design MapReduce method to parallel processing user base data, to provide an efficient data parallel processing platform for thedepartment.The main research work in this paper is summarized as follows:1. This paper analyzes the MapReduce parallel processing framework and distributed column database HBase. Combining the two together, to find suitable technology to meet system business needs for datastorage management and parallel processing.2. Design data from relational databases and text to HBase data migration scenarios. The program combines Sqoop and MapReduce, import the data into HBase, under a number of conditions to consider business features, system performance, greatly accelerated the speed of data preprocessing.3. Design HBase secondary index scheme. The program uses MapReduce combined with HBase Coprocessor to establish secondary index for the data, the whole process is transparent to the application in HBase server, making the system supports complex queries based on the column, to meet business needs.4. Provides external interface for reading and writing system data, the interface is based on HBase Java API, and according to the characteristics of business needs, providing data write, rowkey general inquiries,range queries, batch queries and index-based query.

Keywords/Search Tags:

Massive data set, MapReduce, HBase, Concurrent, Data Migration, Secondary Index

PDF Full Text Request

Related items

1	Optimization Of Massive Meteorological Structured Data Query Based On HBase
2	Research Of Big Data Store Query Technology Based On HBase
3	Research Of Massive Data Processing Model In CDMA Packet Domain Based On Hadoop
4	Research And Development Of Big Data Storage Systems Based On Hbase
5	Optimization For The Data Access Mode Of Mapreduce In HBase
6	The Design And Implementation Of Real-Time Query System For Mass Data Based On Hbase
7	The Design And Implementation Of Massive Data Storage And Calculation Platform Based On Hadoop
8	Research On Index And Query Technology Of HBase For Spatiotemporal Data
9	The Design And Implementation Of High Efficient Data Access Platform Based On HBase
10	The Research And Implementation Of Indexing And Query Techniques Based On HBase And In-memory Database