Font Size: a A A

Research On Mahout Cooperative Filtering Algorithm In KDD2010 Competition

Posted on:2017-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z MengFull Text:PDF
GTID:2278330488464852Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The Outline of China’s National Plan for Medium and Long-Term Education Reform and Development (2010-2020)>pointed out that information technology has a revolutionary influence on education development, and must be attached great importance to. Educational informationization has been the key factor which affects the nation’s modernization. The paper uses collaborative filtering as method, and doing data mining process to the dataset of KDD Cup 2010. KDD Cup is one of the world famous data mining competitions. It used educational data mining as topic in 2010. Therefore, the paper’s research direction has very high value of practice.The paper uses the data chosen form the Intelligent Tutor System, which contains 8.9 millions of data. The dataset has the following characteristics:1. Large volume of data: There are 8918054 lines in the dataset, every line has 23 features, and there are about 200 billion values in total.2. Huge scope of feature(over 450,000).3. The data matrix is sparse: The contestants need to exploit relationships among problems to bring to bear enough data to hope to learn.4. There is a strong temporal dimension to the data:the regular sampling method will make some mistakes.The collaborative filtering recommendation algorithm has been widely used in recommendation system area. It can well solve personal recommendation problem. But with the increment of data, the collaborative filtering algorithm faces some challenges: some problems such as the sparse of data. This paper does deep research in collaborative filtering algorithm, using kinds of collaborative filtering algorithm to predict personal recommendation to students on problem items, and comparing these methods. Finally it gives the best recommendation.Based on the questions above, the paper does some works below:1.Making some deep learning about data mining technology. Learning the mainstream data mining methods, especially the knowledge about the Mahout collaborative filtering recommendation algorithm. Making analysis to the development of the educational data mining and KDD Cup 2010 competition.2.Making analysis about the collaborative filtering algorithm. It mainly contains three kinds of algorithms:User-Based collaborative filtering、Item-Based collaborative filtering and Model-Based collaborative filtering.3.Using the taste frame in Apache Mahout to make simulation experiment with the three kinds of CF algorithms. Using RMSE value as the evaluative criteria to compare the recommendation effect. Finally choosing the best recommendation algorithm.
Keywords/Search Tags:collaborative filtering, data mining, prediction, KDD Cup 2010
PDF Full Text Request
Related items