Font Size: a A A

Research On Recommendation Algorithm For Unbalanced And Sparse Data

Posted on:2019-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:W H CuiFull Text:PDF
GTID:2428330590465771Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information industry,more and more people are shopping on the Internet,resulting in the problem of information overload,especially in the field of e-commerce.Recommendation system was proposed,and it can not only improve user's shopping experience,but also increase business revenue to achieve a win-win goal.In order to solve the problems of serious gender deficiency,class imbalances,data sparsity and too large similarity computation,the users' behavior data from the e-commerce shopping platform is analyzed,a commodity recommendation model for imbalanced and sparse data is proposed in this thesis.The main contents of this thesis are as follows:1.Aimed at the problems of serious gender deficiency and category imbalances,the SMOTE_RF gender prediction method is proposed in this thesis.Firstly,SMOTE algorithm is used to deal with the problem of sample unbalance and getting the balanced gender label data.Then,the random forest model is used for training the balanced gender label data.Finally,the trained model is used to predict the missing sex-tagged samples.Based on original data,the random over-sampling equalization data and the SMOTE sampling equalization data respectively,this thesis uses different models for experimental comparison.Experimental results show that F1 value of gender predicted by random forest model using SMOTE sampling is better,indicating the effectiveness of the proposed method.2.In order to solve the problem of data sparsity,a dynamic cross-filling(DCF)method is proposed in this thesis.Firstly,the users' similarity and the similarity of the products are calculated.Then,the most similarity is added to the collection each time and sorted in descending order,and the dynamic cross-filling method is performed according to the similarity in the set that is greater than the set threshold.Finally,the above process is repeated for dynamic cross-filling to alleviate the sparseness of data,providing high quality data for the follow-up research.3.Aiming at the problem of excessive computation of similarity in the recommendation process,an improved P_KMeans clustering algorithm is proposed to reduce the space-time complexity and improve the operation efficiency in this thesis.A recommendation system model is built.Firstly,the improved P_KMeans algorithm isused to cluster the rating data according to the product.Then,the similarity of the samples in the same cluster is calculated.At the end,the gender factor is taken into account for recommendation.The experimental results further demonstrate the efficiency and accuracy of the recommendation model for unbalance and sparse data proposed are better in this thesis.
Keywords/Search Tags:recommendation algorithm, category imbalanced, data sparsity, similarity calculation
PDF Full Text Request
Related items