Font Size: a A A

Application Of Semi-Supervised Clustering In Digital Library Recommendation

Posted on:2017-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y HuangFull Text:PDF
GTID:2308330485470509Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
At present, the recommended way is usually divided into three categories: content-based recommendation method, collaborative filtering recommendation method and hybrid recommendation method. Content-based recommendation is similar to a product that is recommended for its customers in the past. It does not take into account the user feedback information and the user’s implicit interest, this will lead to the result is not correct. Firstly, collaborative filtering recommendation search for the customer’s neighbors of the current customers, and then the neighbor’s favorite merchandise to recommend to the current customer. So the urgent problem is sparse data and cold start in collaborative filtering recommendation.Semi-supervised clustering is to add less supervision information based on unsupervised learning, and then use this information to improve clustering results. Measure function in the cluster is very common and commonly used measure of the Euclidean distance, but Euclidean distance also exists many shortcomings: The Euclidean distance metric for oval data processing result is poor; If the data set inside the sample correlation between very high, Euclidean distance metric of the effect is not ideal; If the sample data set of large dimensions and amount of calculation will very big, then time algorithm complex degree will be very high.Aiming at the problem of semi-supervised clustering, this paper improves the Markov distance for semi-supervised clustering, aiming at achieving a variety of recommended methods for digital book recommendation. Specific research work is as follows:Firstly, for the problem that collaborative filtering recommendation in similarity calculation method is less, and the shortcoming that Euclidean distance only for spherical data treatment effect better, the processing effect of the oval data poor. So based on entropy theory of Markov distance is used to measure. Then combined with the Gauss mixture model, the semi-supervised clustering is used to construct the objective function and improve the quality of the cluster.Secondly, supervisory information includes not only the data label, but also the connection constraint relation of the sample. The constraint is that the same type of Must-Link must be in the same class, certainly not in a class of Cannot-Link. So before clustering, the constraint relation is used as a priori condition, which is used to guide the clustering process and get the result of clustering. However, the constraint relation set combination cannot be directly observed, so the active learning is used to find the pairwise constraints. Then the constraint conditions are combined with the Markov distance for semi-supervised fuzzy clustering.Thirdly, to construct a digital library recommendation model based on semi-supervised clustering, and the improved two clustering algorithms are applied to the model.
Keywords/Search Tags:Digital Library Recommendation, Semi-Supervised Clustering, Markov Distance, Active Learning, Entropy Theory
PDF Full Text Request
Related items