Research On Parallelization Of Data Mining Algorithm And Its Application

Posted on:2017-04-09

Degree:Master

Type:Thesis

Country:China

Candidate:J Luo

Full Text:PDF

GTID:2278330503986136

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Since the growing popularity of technology and service of social network, internet of things, cloud computing, artificial intelligence and so on, data size, dimensionality and type are growing in amazing speed. We have entered into the time of big data. In the time of big data, the data becomes an indispensable resource, and there are problems when using the traditional serial data mining algorithms to handle big data issues. So we need to parallel the traditional serial data mining algorithms. At the same time, the big data result in the information overload, so the recommendation system is rolled out. The recommendation system auto-recommend products that customers prefer to based on their interests.This paper has done the following researches based on features of big data, MapReduce and recommendation algorithms.(1)Further research and analyze on the concurrency and its ‘condition of data mining algorithms. MapReduce isn’t well suited to all data mining algorithms. Not all data mining algorithms need to use MapReduce to parallel as well. The data mining algorithms that are suited to MapReduce need meet to the requirements: the data is mass and rarely updated. In addition, the concurrency data mining algorithms are required to meet the condition that they can be split into discrete cell and the discrete cells can be paralleled.(2) To deal with weak points when using the basic k-means to handle on the problems of big data, this paper proposed locality sensitive hashing-based MapReduce parallelized K-means algorithm. Firstly, this algorithm combine MapReduce and locality sensitive hashing algorithm to split the dataset and then pick out representative points that can represent the split dataset made up similar points. This way not only can reduce data size, but also decrease the iterations. Secondly, we use the MapReduce parallelized k-means algorithm to cluster the dataset consisted of representative points.(3) In order to handle on the issues of the scalability and real-time performance of recommendation system and the sparsity in the rating dataset, this p0061 per proposed LFM-based locality sensitive hashing-based MapReduce parallelized K-means collaborative filtering algorithm, at first, this algorithm use the method of LFM to fill in the sparse rating dataset to get the complete rating dataset. Then we use locality sensitive hashing-based MapReduce k-means algorithm to cluster the complete rating dataset. At last, we apply the dataset after cluster to forecast non-rating items in the test dataset.

Keywords/Search Tags:

Data Mining, Map Reduce, Recommendation Algorithm, Parallelized K-means

PDF Full Text Request

Related items

1	Research On Semantic Role Mining Based On Parallelized TF-IDF Algorithm In Large Data Environment
2	Research Of Parallelized Distributed Association Rules Mining Algorithm Based On Hadoop
3	Study Of Parallelized Text Mining Algorithm Based On Cloud Computing Framework
4	Research On Personalized Recommendation Technology Of Scenic Spots Based On Data Mining
5	Performance Improvement Of K-means Algorithm And Its Application In Movie Recommender System
6	Design And Implementation Of A Book Recommendation System Based On Apriori And K-means Algorithms
7	Study On Recommendation System Algorithm Based On Web Data Mining
8	Research And Application Of Commodity Recommendation Algorithms Based On Clustering Methods
9	Design And Implementation Of Big Data Recommendation System Based On Multi-GPU Computing
10	The Research Of Recommendation System In E-commerce Based On Web Data Mining