Font Size: a A A

Practice Of Improved Recommendation Algorithm Based On Good-Turing Smoothing In Personalized Recommendation Of Financial News

Posted on:2021-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:X CaiFull Text:PDF
GTID:2428330605457338Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
In the era of big data with the explosion of information,the pace of people's lives is getting faster and faster.In order to meet the needs of people using fragmented time such as commute and lunch break to quickly obtain their effective information through mobile terminals,a news recommendation system came into being.News recommendation changes the traditional manual recommendation-based communication method,enabling users to save reading costs and improve their own efficiency in busy daily work.With the rapid economic growth,people are paying more and more attention to finance and more and more attention to investment and financial management.In order to take advantage of the multi-dimensional unique advantages of user data and implement differentiated competition in the intelligent investment advisory business model,investment apps have introduced a news recommendation system to implement personalized financial news recommendations and asset recommendations for users.Compared with comprehensive news,which needs to tap users' potential interest preferences and improve the readability of users,financial investment news pays more attention to timely information disclosure for users,and guides users to adjust investment operations on the underlying objects(such as stocks and futures)in a timely manner.Therefore,content-based recommendation algorithms are more suitable for our application scenarios than collaborative filtering.However,traditional content-based recommendation algorithms still have some problems.The main research work of this paper includes:(1)When vectorizing news,you need to build a vocabulary to unify the vector dimension of news.The traditional content-based recommendation algorithm is to segment the news of the experimental set,extract keywords,and take the union of keywords to build a vocabulary(called the original vocabulary).The vocabulary constructed by this method has limitations and cannot well reveal the topic of news in the test set,because for some popular financial words,whether it appears as a random phenomenon and does not appear in the experimental set does not mean that it will not appear in the future.In order to improve this problem,regularly crawl the hot words of financial websites to expand the original vocabulary.(2)The expansion of the vocabulary in(1)has created a new problem of sparsity.Some words in the original vocabulary have little significance in revealing the topic of the article,so they can be intercepted to improve the sparsity problem.The LDA topic model is often used for topic clustering or text classification of articles.This article is inspired by principal component analysis and cumulative variance contribution rate.The LDA topic model is used to reduce the original vocabulary.The basic idea is that each The first N words of a topic can better represent the topic.(3)The traditional algorithm uses TF-IDF to calculate the news feature vector,and uses Laplacian smoothing to add one to the IDF denominator to solve the phenomenon that the word does not appear in any news document and causes the denominator to be zero.This method makes the IDF of words that do not appear in any document only depend on the total number of documents,and cannot perform the function of adjusting weights.At the same time,words that do not appear are assigned a value of 1,which is significantly less than the frequency of most words,and static IDF is obviously inappropriate.Therefore,improved TF-IDF based on Good-Turing smoothing is used to vectorize news,and a good recommendation effect is obtained.
Keywords/Search Tags:Cumulative Variance Contribution, Principal Component Analysis, Random Phenomenon, Content-Based Recommendation, TF-IDF, Good-Turing Smoothing, LDA, Finance Investment News
PDF Full Text Request
Related items