Font Size: a A A

Research On Big Data Mining Analysis Method Based On Collaborative Filtering

Posted on:2015-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:X W LiuFull Text:PDF
GTID:2298330467451259Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data, the data scale is growing faster than before. Big data has the characteristic of heterogeneous, huge amount of information and low-density valuable information. Therefore, it’s difficult for people to derive useful personalized information. Collaborative filtering is proposed to solve this problem, in this paper, we did further research on collaborative filtering and cloud computing technology. In addition, I propose a framework for data mining and analysis which based on collaborative filtering and adopt cloud computing environment. This paper mainly studies it from two aspects:Firstly, heterogeneous data make it more difficult to exchange and share between different data sources. In order to provide a unified and transparent data access interface, we need to build a unified data model. The paper proposes a framework for distributed data mining and analysis which based on collaborative filtering algorithm. Aiming to the framework, we propose a heterogeneous data integration model based on XML which implement transformation and integration between data sources.Furthermore, this paper presents RMF_time collaborative filtering algorithm, it combine basic matrix factorization with user time factor and project time factor. The experiments show that the algorithm improves the accuracy of the recommended results. Cloud computing has the characteristics of powerful computing capacity, storage capacity and scalability. So, this paper propose a parallel collaborative filtering algorithm named DRMF which runs on the cloud environment, the algorithm uses the thought of dividing block matrix and stratified stochastic gradient descent to decompose matrix which implement the matrix decomposition performed in parallel in the data stratum. With increasing of the number of data nodes, the experiments show that this algorithm becoming more efficient when deals with large-scale data sets. The algorithm achieves ideal computing performance. So we believe that combining cloud computing technology with collaborative filtering to resolve information filtering on big data is feasible.
Keywords/Search Tags:big data, cloud computing, collaborative filtering, matrix factorization
PDF Full Text Request
Related items