Font Size: a A A

Application And Research Of DBSCAN Based On Hadoop Platform

Posted on:2014-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y G WangFull Text:PDF
GTID:2248330398457411Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, with the development of infomation technology and the internet, the amount of data transmitted through the internet is increasing sharply. Furthermore, these data will increase at a faster speed and more large scale. The function of database and related technology are also changing and being upgraded, especially the amount of data in database have already presented the explosive growth. It is very difficult for us to get information and knowledge which we want from these huge amounts. We are confronted with a embarrassed situation that although we have huge amount of information, we don’t know what we want at last. We must analyze and manage vast amounts of data in the database, so, data mining technology come into being. Cluster mining is a important content and tool in the Held of data mining and how to improve the performance of clustering algorithms have great significance of research. The traditional algorithm only can have effect on static database, not getting the results from data mining in time. The knowledge and rules which have been excavated previously may no longer applies to new data, making the correctness of final decision become unreliable at large extent.As the current research focus at home and abroad, cloud computing is the extension and development of grid, parallel and distributed computing. On the platform of cloud computing, people can obtain unimaginable computing power, storage capacity and infrastructure from the network. Dividing the big problem such as huge amounts of data into some small groups, we can distribute these groups among nodes in cloud computing to deal with these data. Consequently, we don’t need expensive mainframe computers to deal with problems as traditional ways any more. Not only reduces the terminal equipment requirements, but also largely improves the computing power.In this paper, first of all, a kind of main mining algorithm is proposed, it is DBSCAN clustering algorithm. After deep research and discussion of its mining, we propose a incremental DBSCAN clustering algorithm to solve the insufficiency of traditional algorithm. Secondly, in this paper, this algorithm combined with an open source framework Hadoop in the cloud computing, we can use its MapReduce programming ideas to partition the huge amounts of data into many small blocks. Then, the Hadoop framework will distribute these blocks on the cloud computer cluster. Furthermore, every block of data can run concurrently in the computer cluster. Finally, this paper combines incremental DBSCAN mining algorithm with Hadoop platform. We transform the DBSCAN algorithm into the style of MapReduce. When there is new or deleted data in database, we don’t need mine the whole database again, instead, we only do some local mining to new data. We can combine new knowledge and previous knowledge into the final knowledge which we need. In this way, it relieve the delay problem of time when deal with huge amounts of data and verify its efficiency and effectiveness through the simulation data.
Keywords/Search Tags:Hadoop, MapReduce, DBSCAN Algorithm, Incremental Mining
PDF Full Text Request
Related items