Font Size: a A A

Cloud Computing And A Number Of Data Mining Algorithms Mapreduce Research

Posted on:2011-11-05Degree:MasterType:Thesis
Country:ChinaCandidate:J H LiFull Text:PDF
GTID:2208360308966197Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Cloud Computing, a hot topic among international IT industry, has been brought up and emphasized in China recently. It's an irreversible trend. Distributed processing, parallel processing and grids calculation can be considered as a prototype of Cloud Computing of which distribution and parallel are the keys. Massive data processing and massive calculation are the most important targets of Cloud Computing. However, as a thinking model, if Cloud Computing tries to show its magic, cloud computing platform and efficient parallel strategies must be constructed besides hardware.Massive calculation tasks always show as a regular problem in the area of data mining. Many traditional data mining algorithms can only deal with small-scale input data and will run slower or even collapse when the input data increase. The problem above is always a bottleneck of traditional data mining algorithm. Better performance can be achieved if we can transplant these algorithms on the Cloud Computing platform and make them run in parallel. Thus, whether the algorithm can be run in parallel properly or not becomes the key to solve the problem mentioned above.File system and programming methodology of Google, Sector/Sphere and Hadoop are analyzed first in this paper. Secondly, a Chinese hot-topic extraction algorithm is proposed, which takes some factors into account, such as contents of text, forgetting of human, popularity of topic and so on. Then a better performance is achieved after we MapReduce and run the algorithm on Hadoop platform. The algorithm becomes much faster and can deal with large-scale input data. Finally, after analyzing the process of collaborative filtering algorithm,Local linear regression algorithm and Naive Bayes algorithm, the bottleneck and the aspect which can be parallelized in these algorithms corresponding MapReduced algorithms are proposed, which handle the key problem of efficiency successfully. The research achievements gained in this paper provide a solution for MapReducing algorithms of data mining, and the experiment results demonstrate the effectiveness of the solution.
Keywords/Search Tags:Cloud computing, MapReduce, Chinese Hot-topic, Collaborative filtering
PDF Full Text Request
Related items