Research On Sort Algorithm On Massive Data Of Two-Dimension Table

Posted on:2012-07-22

Degree:Master

Type:Thesis

Country:China

Candidate:J Fan

Full Text:PDF

GTID:2218330368979470

Subject:Computer software and theory

Abstract/Summary:

The impact which the information age has brought to our life is shocking. Data, as one of the most important media of the information, has deeply penetrated into our life both in breadth and depth. According to some researches, the amount of the global data has been growing at an average rate of 80% annually, which means that the information age has come. Nowadays, the storage of massive data has been well solved and achieved, and network storage has been widely used by civilians. SUN, HP, Brocade and other companies have all developed the high-performance massive data storage device with comprehensive strategic. However, how can we get what we want among the massive data, which has been explosively grown. This problem is the topic of massive data searching and processing, which is on the stage of research, nowadays.Data mining, knowledge discovery, cloud computing, are all based on massive data processing. The data mining, which has been widely used in many fields such as finance, retail, telecommunication, scientific exploration and so on, transfers data from data grave to knowledge nuggets. The introduction of cloud computing is undoubtedly significance, which provides powerful computing and service, based on a great deal of resource. To play their magic, high-performance algorithm is a basis technology. But, the traditional method cannot do well in dealing with large-scale data. So a proper algorithm for massive data is a key to these technologies.Nowadays, with the development of hardware technology, the prices of the equipment have significantly dropped, space is no longer a bottleneck, and time efficiency is increasingly becoming the focus. This paper is based on the idea of time for space, we get down to the basic question of ranking and had in-depth research in it. In the massive data processing, sort for two-dimension table based on discrete data is a basic operation. If we first sort for two-dimension table, the efficiency will be greatly enhanced in the follow-up operation, many complex problems are appealing to sort for two-dimension table, Sort for two-dimension table is widely used in data mining, machine learning, databases, rough sets and other fields. In this paper we first had in-depth analysis of quick sorting algorithm of two-dimension table. Then, for the high efficiency requirement of the algorithm in massive data processing, we improve the quick sort algorithm, put forward a Hash sort algorithm on two-dimension table, and finally extends it to massive two-dimension table under cloud computing model.This algorithm is deepened by equivalence classes of rough set ideology, extended ordered equivalence classes to sort for two-dimension table, and realize parallel computing in the background of cloud model with the independence between blocks, with the increasing of parallel degrees, efficiency is greatly improving, the advantage in huge data sets is obvious. The approach in the paper under cloud computing model, combined data partitioning, computing task arrangement of each node with orderly division of equivalence classes together, which reducing the cost of work, this model is efficient in query, which can be used for high performance applications under cloud computing.

Keywords/Search Tags:

massive data, sort for two-dimension table, classification, time efficiency

Related items

1	Design And Implementation Of Data Warehouse System For LAMOST Spectra
2	Research On Key Technologies Of Massive MIMO Wireless Communication System For B4G/5G
3	The Design And Implementation Of Packet Classification Based On Rule Table
4	Design And Realization Of IP Actvity Table Based On A Distributed Infrastructure
5	The Construction Of The Data Source System For The OLAP Analysis Based On The Ordinary Primary School Supervisory Assessment System
6	The Research On The Key Technology Of Optimizing Massive Geological Data's Search
7	Oriented Data Warehouse, Multi-table Joins And Aggregation Algorithm Research
8	Research And Realization Of Table Formular Engine
9	Management Information System Based On Data Warehouse
10	Research On Joint Optimization Of Energy Efficiency And Spectral Efficiency Of Massive MIMO Based On Swarm Intelligence Algorithm