Font Size: a A A

Research On Collaborative Filtering Algorithm Based On Feature Engineering

Posted on:2020-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:D L MaoFull Text:PDF
GTID:2428330599456767Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Collaborative filtering algorithm is one of the most widely used and successful recommendation algorithms.It calculates similarity by analyzing user or item characteristics,generates the nearest neighbor set,and finally predicts the score of unknown items to generate recommendation results.User or item features are very important in collaborative filtering algorithm,and their discrimination and sparsity will directly affect the accuracy of prediction.Both traditional collaborative filtering algorithm and improved collaborative filtering algorithm regard rating data as the characteristics of users or projects.Therefore,there are two problems to be explored and optimized.(1)The distinction between user and project characteristics is not enough.Scoring data is largely influenced by user personalized preferences,behavior habits and other immeasurable factors.For distinguish users,the same score will represent different preferences,while different scores may represent the same preferences.Therefore,there is a problem of low distinction between features when rating data is used as user or item characteristics.In order to obtain more distinguished features,researchers will use different methods,such as content-based methods to introduce user or project features,demographic features or natural language processing technology.(2)User or project features are highly sparse.With the rapid expansion of Internet users and the popularity of e-commerce,the items scored by users usually account for only a small part of the total items,and the scoring data is extremely sparse.At this time,there will be a problem of high sparseness of features when rating data is regarded as user or item features.In order to mitigate the impact of sparsity,researchers have innovated a variety of control filling techniques,dimension reduction techniques andsimilarity calculation methods which are insensitive to sparsity.In this paper,from the perspective of feature engineering,aiming at the problem that the prediction accuracy is reduced due to the low discrimination and sparsity of features,the following two aspects are studied:(1)Aiming at the problem of low distinguished of user features,an AF-CF(Collaborative filtering based on attribution features)algorithm is proposed.Attribution theory belongs to the category of social psychology.By analyzing the highly differentiated user characteristics such as consistency,distinction,consistency,positive and negative preferences,attribution analysis of user behavior can be very good,that is,reasoning the reasons for user behavior.Therefore,AF-CF algorithm uses statistical methods to extract three characteristics of attribution theory: consistency,distinction,positive and negative preferences,in order to obtain the features with high degree of discrimination.The user rating behavior is attributed to user preferences,and the user project preferences are obtained by linear processing of these three characteristics.Preference similarity and score similarity are calculated.In order to take into account the advantages of the two similarities,similarity is fused and score prediction is made at last.In order to verify the effectiveness of the work,the similarity fusion parameters are first optimized and the optimal parameters are obtained: ??(28)(28)?6.01.0.Then,under the optimal fusion parameters,compared with the prediction accuracy(MAE)of the traditional collaborative filtering algorithm,the MAE value decreased by about 1.5%,that is,the prediction accuracy of the algorithm increased by 1.5%.Finally,compared with the three latest improved collaborative filtering algorithms,the MAE value of the algorithm is reduced by 4%-5%.Work(1)verifies the influence of feature discrimination on the prediction accuracy of collaborative filtering algorithm.The higher the feature discrimination,the higher the prediction accuracy of the algorithm.As we all know,sparsity is also an important factor affecting the accuracy of the algorithm prediction.Therefore,the next step is mainly to reduce the impact of data sparsity on the algorithm,and then to extract features with high discrimination.(2)In order to reduce the impact of sparsity on the prediction accuracy of collaborative filtering algorithm,and to extract the features with high discrimination,a collaborative filtering algorithm based on label mapping(LM-CF)is proposed.Annotation mapping is a feature extraction method proposed in this paper.Annotation isto generate the category information of the original data.Mapping is to convert the original data into new features according to the annotation.According to the different labeling methods,clustering labeling mapping and self-labeling mapping are used to extract the new features with low dimensionality and high discrimination.Unlike existing feature extraction methods,label mapping extracts new features from sets.Using new features as data,item-based scoring prediction is carried out.Linear Jaccard method is used to calculate similarity.Through fine-grained partition of features,item similarity can be better calculated.Experiments are carried out on the general data sets Movie lens,Yahoo! R4,Film Trust,and MAE is used as the evaluation criterion to verify the effectiveness of the work.Firstly,the applicability of clustering labeling and self-labeling is analyzed.Clustering labeling is suitable for large data sets,and there is no requirement for data format.Self-labeling is suitable for small data sets and strict requirement for data format.Finally,by comparing with the MAE values of the four latest algorithms,it is proved that the work(2)is better than the comparison algorithm,and the MAE values are reduced by 2%-12%.
Keywords/Search Tags:collaborative filtering, feature engineering, attribution theory, cluster, sparsity
PDF Full Text Request
Related items