Font Size: a A A

Research On Sort Algorithm On Massive Data Of Two-Dimension Table

Posted on:2012-07-22Degree:MasterType:Thesis
Country:ChinaCandidate:J FanFull Text:PDF
GTID:2218330368979470Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The impact which the information age has brought to our life is shocking. Data, as one of the most important media of the information, has deeply penetrated into our life both in breadth and depth. According to some researches, the amount of the global data has been growing at an average rate of 80% annually, which means that the information age has come. Nowadays, the storage of massive data has been well solved and achieved, and network storage has been widely used by civilians. SUN, HP, Brocade and other companies have all developed the high-performance massive data storage device with comprehensive strategic. However, how can we get what we want among the massive data, which has been explosively grown. This problem is the topic of massive data searching and processing, which is on the stage of research, nowadays.Data mining, knowledge discovery, cloud computing, are all based on massive data processing. The data mining, which has been widely used in many fields such as finance, retail, telecommunication, scientific exploration and so on, transfers data from data grave to knowledge nuggets. The introduction of cloud computing is undoubtedly significance, which provides powerful computing and service, based on a great deal of resource. To play their magic, high-performance algorithm is a basis technology. But, the traditional method cannot do well in dealing with large-scale data. So a proper algorithm for massive data is a key to these technologies.Nowadays, with the development of hardware technology, the prices of the equipment have significantly dropped, space is no longer a bottleneck, and time efficiency is increasingly becoming the focus. This paper is based on the idea of time for space, we get down to the basic question of ranking and had in-depth research in it. In the massive data processing, sort for two-dimension table based on discrete data is a basic operation. If we first sort for two-dimension table, the efficiency will be greatly enhanced in the follow-up operation, many complex problems are appealing to sort for two-dimension table, Sort for two-dimension table is widely used in data mining, machine learning, databases, rough sets and other fields. In this paper we first had in-depth analysis of quick sorting algorithm of two-dimension table. Then, for the high efficiency requirement of the algorithm in massive data processing, we improve the quick sort algorithm, put forward a Hash sort algorithm on two-dimension table, and finally extends it to massive two-dimension table under cloud computing model.This algorithm is deepened by equivalence classes of rough set ideology, extended ordered equivalence classes to sort for two-dimension table, and realize parallel computing in the background of cloud model with the independence between blocks, with the increasing of parallel degrees, efficiency is greatly improving, the advantage in huge data sets is obvious. The approach in the paper under cloud computing model, combined data partitioning, computing task arrangement of each node with orderly division of equivalence classes together, which reducing the cost of work, this model is efficient in query, which can be used for high performance applications under cloud computing.
Keywords/Search Tags:massive data, sort for two-dimension table, classification, time efficiency
PDF Full Text Request
Related items