Font Size: a A A

Stock Research Engine Based On Theme Crawler

Posted on:2023-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:H X WangFull Text:PDF
GTID:2558307028997929Subject:Engineering
Abstract/Summary:PDF Full Text Request
As the increasing of citizens’ income in recent years,people have more and more spare cash,along with continuously enhanced investment awareness.Looking at the historical data of various investment categories,the return on stock investment is the highest in the long run,and the number of people who chose this investment method is also the largest.The amount of stock-related information is growing rapidly,the matching search results of financial websites can’t meet the investors’ requirements.If we entered any stock at random,the results often deviate from topic or even irrelevant to the topic.How to mine and organize the data of financial websites and provide investors with more accurate financial information has always been the focus of research.To improve the above status,this essay has studied and improved the traditional PageRank ranking algorithm.Taken Xueqiu.com as the information source.A stock search platform was designed and implemented which is based on topic crawler.The main contents of this paper are as follows:1.Researching and improving the PageRank algorithm,coming up SI-PageRank algorithm,which has added decision method of theme correlativity to solve the theme deviation in PageRank algorithm.TF-IDF technology has been used to extract stock related keywords of the page,vector space model algorithm will calculate the keywords and topic correlativity more precisely.Secondly,in view of the problem of searching results prefer old webpages,time weighting factors have added to compensate the weight of newly released webpages,so the weighting of new and old pages tends to be balanced.Last,to make the searching results be more accurate and authoritative,author impact factor and keyword position factor has been added.In these ways to make the new algorithm be more relevant to the topic of stock search.2.Testing function and performance on the platform to ensure that the platform achieves the expected results.A comparative test was done on the former and new algorithms.The experimental results show that the improved algorithm has a good effect in the field of stock search.Pages with clear themes and high author authority are ranked higher.Under the same sample condition,the improved algorithm can solve the problem of theme deviation and the preferring of old web pages and the returned data is more relevant to the theme and more accurate.
Keywords/Search Tags:Theme crawler, PageRank algorithm, Theme correlativity, Multithreading, Page sorting
PDF Full Text Request
Related items