| With the arrival of the Internet era,the scale of information on the Internet hasdramatically expanded,and the accompanying "information overload" problem has become more and more serious.The information recommendation service based on information retrieval cannot meet the increasing information service needs of users.This also gave birth to a personalized recommendation system.The personalized recommendation system can help the user to capture the information the user needs from massive,unordered information,to some extent ease the "information overload" problem.Among them,collaborative filtering technology is one of the most successful technologies in the field of personalized recommendation,and it is widely used in various fields of the Internet.However,with the dramatic increase in data size and the continuous increase in user demand,collaborative filtering technology has also exposed many problems,such as data noise,data sparseness,cold start-up,and scalability,which have seriously affected the quality of information recommendation services.Data noise is the first problem to be studied in this thesis.When users score,they may cause some data noise in the score data because some environmental factors affect the subconsciously giving an improper score or there are some malicious brush scores.The scoring data has a great influence on the computing neighbor group of one of the core processes of the collaborative filtering technology,so if the scoring noise of the original scoring data is not eliminated,the quality of subsequent information recommendation services may be affected.The second problem studied is the problem of data sparsity.Because the sparse degree of data is often very high in preference data,the lack of preference information will lead to a serious decrease in the accuracy of subsequent computing similar groups,and in extreme cases it will lead to cold start problems,and then affect the quality of follow-up information recommendation services.The specific work of this article is as follows:For the data noise problem,this thesis will use data cleaning algorithm based on fuzzy clustering and Weighted Slope One algorithm.The traditional Slope One algorithm only considers the project popularity difference,but does not consider the user similarity degree information and project evaluation quantity information.In this thesis,the algorithm will first make user fuzzy clustering according to user preference information,then combine the user's degree of subordination for each cluster and the bias value between the items in each cluster to calculate the difference in the popularity of the final project,and finally The Weighted Slope One algorithm calculates the final adjusted score data.After experiments,the datacleaning algorithm based on fuzzy clustering and Weighted Slope One algorithm has significantly improved the effect of noise elimination.For the data sparse problem,this thesis will use the data filling algorithm based on Winnow algorithm.The algorithm firstly combines the project tag information and the score data to initialize the user feature matrix,then optimizes the user feature matrix through the Winnow algorithm,and then determines the filling credibility,and fills the scores that satisfy the conditions through the user feature matrix.After experiments,the scoring matrix filled by the algorithm has higher accuracy and coverage in recommendation. |