Research On Mahout Cooperative Filtering Algorithm In KDD2010 Competition

Posted on:2017-03-07

Degree:Master

Type:Thesis

Country:China

Candidate:Z Meng

Full Text:PDF

GTID:2278330488464852

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

The Outline of Chinaâ€™s National Plan for Medium and Long-Term Education Reform and Development (2010-2020)>pointed out that information technology has a revolutionary influence on education development, and must be attached great importance to. Educational informationization has been the key factor which affects the nationâ€™s modernization. The paper uses collaborative filtering as method, and doing data mining process to the dataset of KDD Cup 2010. KDD Cup is one of the world famous data mining competitions. It used educational data mining as topic in 2010. Therefore, the paperâ€™s research direction has very high value of practice.The paper uses the data chosen form the Intelligent Tutor System, which contains 8.9 millions of data. The dataset has the following characteristics:1. Large volume of data: There are 8918054 lines in the dataset, every line has 23 features, and there are about 200 billion values in total.2. Huge scope of feature(over 450,000).3. The data matrix is sparse: The contestants need to exploit relationships among problems to bring to bear enough data to hope to learn.4. There is a strong temporal dimension to the data:the regular sampling method will make some mistakes.The collaborative filtering recommendation algorithm has been widely used in recommendation system area. It can well solve personal recommendation problem. But with the increment of data, the collaborative filtering algorithm faces some challenges: some problems such as the sparse of data. This paper does deep research in collaborative filtering algorithm, using kinds of collaborative filtering algorithm to predict personal recommendation to students on problem items, and comparing these methods. Finally it gives the best recommendation.Based on the questions above, the paper does some works below:1.Making some deep learning about data mining technology. Learning the mainstream data mining methods, especially the knowledge about the Mahout collaborative filtering recommendation algorithm. Making analysis to the development of the educational data mining and KDD Cup 2010 competition.2.Making analysis about the collaborative filtering algorithm. It mainly contains three kinds of algorithms:User-Based collaborative filteringã€Item-Based collaborative filtering and Model-Based collaborative filtering.3.Using the taste frame in Apache Mahout to make simulation experiment with the three kinds of CF algorithms. Using RMSE value as the evaluative criteria to compare the recommendation effect. Finally choosing the best recommendation algorithm.

Keywords/Search Tags:

collaborative filtering, data mining, prediction, KDD Cup 2010

PDF Full Text Request

Related items

1	A New Prediction Approach Based On Polynomial Regression For Collaborative Filtering
2	Collaborative Filtering Study Based On Electrical Resistance Network And Sparse Data Prediction
3	Research On Big Data Mining Analysis Method Based On Collaborative Filtering
4	Research And Application Of Collaborative Filtering Recommendation Algorithm
5	Research And Improvement Of Collaborative Filtering Algorithm Of Similarity And The Scalability Problem
6	Research On Genre-Based Hybrid Collaborative Filtering Algorithm
7	Research On QoS Prediction Approach With Latent Feature
8	Research On Some Key Technologies Of User-based Collaborative Filtering Recommendation Algorithm
9	Collaborative Filtering Algorithm Based On Neighborhood Relationship
10	Researches On Service Context Processing Mechanism And Prediction Theory And Key Technology