Font Size: a A A

Research On Parallelization Of Data Mining Algorithm And Its Application

Posted on:2017-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:J LuoFull Text:PDF
GTID:2278330503986136Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Since the growing popularity of technology and service of social network, internet of things, cloud computing, artificial intelligence and so on, data size, dimensionality and type are growing in amazing speed. We have entered into the time of big data. In the time of big data, the data becomes an indispensable resource, and there are problems when using the traditional serial data mining algorithms to handle big data issues. So we need to parallel the traditional serial data mining algorithms. At the same time, the big data result in the information overload, so the recommendation system is rolled out. The recommendation system auto-recommend products that customers prefer to based on their interests.This paper has done the following researches based on features of big data, MapReduce and recommendation algorithms.(1)Further research and analyze on the concurrency and its ‘condition of data mining algorithms. MapReduce isn’t well suited to all data mining algorithms. Not all data mining algorithms need to use MapReduce to parallel as well. The data mining algorithms that are suited to MapReduce need meet to the requirements: the data is mass and rarely updated. In addition, the concurrency data mining algorithms are required to meet the condition that they can be split into discrete cell and the discrete cells can be paralleled.(2) To deal with weak points when using the basic k-means to handle on the problems of big data, this paper proposed locality sensitive hashing-based MapReduce parallelized K-means algorithm. Firstly, this algorithm combine MapReduce and locality sensitive hashing algorithm to split the dataset and then pick out representative points that can represent the split dataset made up similar points. This way not only can reduce data size, but also decrease the iterations. Secondly, we use the MapReduce parallelized k-means algorithm to cluster the dataset consisted of representative points.(3) In order to handle on the issues of the scalability and real-time performance of recommendation system and the sparsity in the rating dataset, this p0061 per proposed LFM-based locality sensitive hashing-based MapReduce parallelized K-means collaborative filtering algorithm, at first, this algorithm use the method of LFM to fill in the sparse rating dataset to get the complete rating dataset. Then we use locality sensitive hashing-based MapReduce k-means algorithm to cluster the complete rating dataset. At last, we apply the dataset after cluster to forecast non-rating items in the test dataset.
Keywords/Search Tags:Data Mining, Map Reduce, Recommendation Algorithm, Parallelized K-means
PDF Full Text Request
Related items