Font Size: a A A

Research On Collaborative Filtering Algorithm Based On Popular Tags And Link Communities

Posted on:2019-08-09Degree:MasterType:Thesis
Country:ChinaCandidate:L H BaoFull Text:PDF
GTID:2428330623968771Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The advent of the Internet + era has created human's new needs for information and changed the way people interact with the world.How to help users get accurate,efficient,and comprehensive access to the required information has become a research hotspot.As a way for users to obtain information,recommendation system plays an important role in information search and acquisition.The recommendation system provides interested information services by analyzing the user's historical behavior.After careful analysis and in-depth study of the existing recommendation algorithm,this paper puts forward a collaborative filtering algorithm based on popular tags and edge associations and va lidates it on the Spark platform.The experimental results show that the algorithm can effectively improve the accuracy of the recommendation.The main work of the dissertation is as follows:First of all,for the sparseness of data,this paper focuses on the label-based collaborative filtering recommendation algorithm.This article uses popular tags to represent each common tag used by the user,thereby maximally retaining the user's complete information and personalized information.It also reduces the modeling dimension and alleviates the phenomenon of poor recommendation due to data sparsity.Secondly,according to the redundancy of data,An improved algorithm is proposed: clustering ordinary tags represented by popular tags based on edge detection algorithms.Since the conventional clustering algorithm can only classify one tag into one category,there are often some ambiguous fuzzy tags in the actual application scene.The edge detection algorithm can effectively find overlapping clusters,find the word polysemy labels,and then consider the meaning of other labels with the edge relationship of the label.Judging the true meaning of the label and the category it belongs to effectively solved the redundancy of the label and improved the quality of the recommendation results.In addition,considering the problems of slower data processing,low real-time performance,and low scalability,the improved algorithm is implemented on the Spark platform.When the data size is particularly large,Spark platform has higher operating efficiency,higher real-time performance and scalability than Hadoop platform.Finally,this paper compares the traditional clustering algorithm based on common tags and the edge detection algorithm based on popular tags,and evaluates it by using two indexes: accuracy rate and recall rate.The experimental results show that the improved algorithm in this paper can effectively improve the accuracy of the recommendation results.At the same time,based on four different scales of MovieLens d atasets,the comparison experiments were conducted on the Hadoop platform and the Spark platform respectively.The data shows that when the data volume is larger,the Spark platform has greater advantages in processing speed and scalability than the Hadoop platform.
Keywords/Search Tags:Popular Label, Edges Community, Collaborative Filtering, Recommended, Spark
PDF Full Text Request
Related items