Font Size: a A A

The Research And Implement Of Data Mining Algorithms Based On Hadoop

Posted on:2012-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y L BaiFull Text:PDF
GTID:2178330335459985Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The computers'processing power and massive data storage capacity have improved a lot by now, and some general patterns can be discovered from all kinds of practical datasets which are accumulated in the objective world. To find out the laws, statistics, data mining, machine learning and some other related technologies are used. In the past decade, researchers have found that the network structure is widely included in nature and human society, and reveal some unique structural features in real world gradually.With the development of network science, the network-based analysis and graph mining have been payed more and more attentions and are widely used in the physical, biological, political, economic, Internet, engineering development and social life and so on. Researchers process the datasets into network structure, and use graph theory, data mining to reveal the useful patterns. So, researchers can understand the objects in a high level.In this paper, we process the large scale datasets and research how to discover useful results efficiently and how to apply these results to the relevant areas. Hadoop has been widely used in many fields and MapReduce is proven to be an efficient computing method. So, the paper focuses on how to implement efficient data mining algorithms on Hadoop platform. The detail contents are as follows. First, we achieve Hadoop based association rule algorithms which are based on apriori. Second, we implement two distributed graph mining algorithms which are clustering coefficient and subgraph mining algorithms. Then, we conducted many experiments. The results show that these algorithms can make full use of all nodes'CPU and the algorithms have perfect scalability. The algorithms can provide better solutions in data mining of large datasets. This paper finally introduces our social network analysis algorithm package which contains the structure and many graph algorithms. It includes weak connected component (WCC), strongly connected component (SCC), single source shortest path (SSSP), K-core, minimum spanning tree (MST), betweenness centrality algorithm and so on.
Keywords/Search Tags:data mining, social network analysis, graph mining, Hadoop, parallel algorithm
PDF Full Text Request
Related items