Font Size: a A A

Research On Data Mining Algorithm And Its Parallelization In Cloud Computing

Posted on:2016-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:T H ShaoFull Text:PDF
GTID:2308330473964422Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years, the concept of big data attract more and more people’s attention, and the basis of massive data mining technology as big data becomes increasingly important. Transforming the traditional data mining algorithms and deploy it to the cloud over the platform is the best way to solve the massive data mining. Its essence is to enhance the speed and storage capacity of data mining to deal with the data mining tasks which can not be handled by single computer, but get little mining algorithm itself does not affect accuracy. In traditional data mining techniques, there are many studies to improve the accuracy of the original algorithm, however, these improved algorithm makes the most of the complexity is very high, making it difficult for parallel computing, or a significant reduction in the speed of parallel computing. So, how to balance accuracy and computational speed, design the improved algorithm in the cloud computing environment has become an important research topic.This thesis first analyzes the proposed solution to the existing data mining face under the age of big data technology bottlenecks and existing research results. Then, respectively study the Bias classification, Apriori association rules data mining parallelization algorithm, aiming at the characteristics and the mechanism of each algorithm respectively, corresponding improvement on the algorithm itself and parallelization scheme for the design of the corresponding.In the classification algorithm added synonyms merger and word frequency filtering methods makes vector dimensionality reduction, reducing false positives. Then weight the special keywords and integrated into the weight calculations in parallel computing, enhanced the classification accuracy without compromising performance.In association rule algorithm modify the calculate flow of the algorithm, decompose the incremental frequent set mining process into independent computing which can across layers, reducing the force synchronization between nodes and each node in the I / O overhead. While optimizing the data distribution, to avoid a lot of repetitive calculationsAnd then summarize the above theoretical research results, tried a new approach to solve the traditional problem of music personalized recommendations according to the new features of data mining technology which generated in the context of cloud computing. Combine the data mining technology and social networking, clustering and associating the various aspects of the user’s preferences to achieve more accurate recommended effect. It also verified the cloud computing efficiency can not only bring about major upgrade of data mining, but also can provide more new ideas for the original data mining scene.Finally, the theory proposed in the above method found during practice, although for large-scale clusters, cloud computing can improve equipment utilization and efficiency, but for the limited conditions of individuals or small clusters, the deployment of a cloud of data mining platform is a great burden. Therefore, according to the actual situation in the design of a flexible cloud data mining platform, and hadoop openstack combine to solve the problem, get a usable platform for data mining practice.
Keywords/Search Tags:Data Mining, Bias, Cloud Computing, Parallel, Music Recommendation
PDF Full Text Request
Related items