Application And Research Of DBSCAN Based On Hadoop Platform

Posted on:2014-02-09

Degree:Master

Type:Thesis

Country:China

Candidate:Y G Wang

Full Text:PDF

GTID:2248330398457411

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

In recent years, with the development of infomation technology and the internet, the amount of data transmitted through the internet is increasing sharply. Furthermore, these data will increase at a faster speed and more large scale. The function of database and related technology are also changing and being upgraded, especially the amount of data in database have already presented the explosive growth. It is very difficult for us to get information and knowledge which we want from these huge amounts. We are confronted with a embarrassed situation that although we have huge amount of information, we don’t know what we want at last. We must analyze and manage vast amounts of data in the database, so, data mining technology come into being. Cluster mining is a important content and tool in the Held of data mining and how to improve the performance of clustering algorithms have great significance of research. The traditional algorithm only can have effect on static database, not getting the results from data mining in time. The knowledge and rules which have been excavated previously may no longer applies to new data, making the correctness of final decision become unreliable at large extent.As the current research focus at home and abroad, cloud computing is the extension and development of grid, parallel and distributed computing. On the platform of cloud computing, people can obtain unimaginable computing power, storage capacity and infrastructure from the network. Dividing the big problem such as huge amounts of data into some small groups, we can distribute these groups among nodes in cloud computing to deal with these data. Consequently, we don’t need expensive mainframe computers to deal with problems as traditional ways any more. Not only reduces the terminal equipment requirements, but also largely improves the computing power.In this paper, first of all, a kind of main mining algorithm is proposed, it is DBSCAN clustering algorithm. After deep research and discussion of its mining, we propose a incremental DBSCAN clustering algorithm to solve the insufficiency of traditional algorithm. Secondly, in this paper, this algorithm combined with an open source framework Hadoop in the cloud computing, we can use its MapReduce programming ideas to partition the huge amounts of data into many small blocks. Then, the Hadoop framework will distribute these blocks on the cloud computer cluster. Furthermore, every block of data can run concurrently in the computer cluster. Finally, this paper combines incremental DBSCAN mining algorithm with Hadoop platform. We transform the DBSCAN algorithm into the style of MapReduce. When there is new or deleted data in database, we don’t need mine the whole database again, instead, we only do some local mining to new data. We can combine new knowledge and previous knowledge into the final knowledge which we need. In this way, it relieve the delay problem of time when deal with huge amounts of data and verify its efficiency and effectiveness through the simulation data.

Keywords/Search Tags:

Hadoop, MapReduce, DBSCAN Algorithm, Incremental Mining

PDF Full Text Request

Related items

1	Research And Implementation On Incremental Data Processing Algorithm Based On Hadoop
2	Research On Incremental Computing Technologies And Algorithms Based On MapReduce
3	Research And Application Of Hadoop Distributed Clustering Mining Method Based On Virtual Machine
4	Research On Adaptive Clustering Algorithm Based On DBSCAN Theory
5	The Research Of Data Optimization And Application Of Clustering Algorithm Based On Hadoop
6	Research And Optimization Of Distributed Reptiles Based On
7	Research Of Frequent Itemsets Mining Algorithm Based On MapReduce Calculation Model
8	Research Of Clustering Algorithm Based On Cloud Computing Platform
9	The Application Of Improved DBSCAN On DBMAS
10	Analysis And Research On Parallel Clustering Algorithm Based On Hadoop