Font Size: a A A

Research On The Sorting Algorithm Of Scientific Literature Retrieval Based On TF-IDF

Posted on:2022-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:W W LiuFull Text:PDF
GTID:2518306353984099Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,a large number of information resources continue to emerge,and the way people obtain information is gradually replaced by the Internet.Among them,literature resources are important reference materials for scientific research personnel to study and research.In order to help researchers quickly and efficiently retrieve high-quality scientific literature that meet the needs of researchers from the large number and uneven quality of scientific literature,the predecessors based on this,this article has conducted in-depth research.The traditional method relies only on the paper's frequency to evaluate the reading value of the literature.This method has many problems in practical applications:(1)The calculated results tend to be published in literature with a long time,and are not conducive to papers with a short publication time but with the potential for academic influence;(2)In the value transfer between citations,the impact of bubble citations and topic relevance between citations is not considered;(3)Ignore other influencing factors of document value,such as author influence and document authority.In response to the above problems,this paper proposes a TF-IDF-based retrieval and ranking method of scientific and technological literature.This method combines the inherent value of scientific and technological documents and the value transferred between scientific and technological documents to calculate the total value of scientific and technological documents.In view of the aging phenomenon of the inherent value of scientific and technological literature and the inability to dig out hot scientific and classical scientific documents,this article introduces a combination of time factor and the number of citations as an indicator of the inherent value of scientific and technological literature;the journal impact factor for scientific and technological literature is in dynamic change.Issues affecting the value evaluation of scientific and technological literature.This article considers the various changes in journal impact factors,and uses the improved journal impact factor as an indicator of the inherent value of scientific and technological literature;for the evaluation method of author's influence,only the total number of articles published by authors and On the basis of the total number of citations,this article also considers the quality of the author's article to evaluate the author's influence,and uses it as an indicator of the inherent value of scientific and technological literature;for the Page Rank algorithm,there is subject drift and weight in the citation network.The phenomenon of equal distribution of values,this paper introduces the TFIDF algorithm and cosine similarity to calculate the similarity between citations,and uses the similarity as an indicator of value evaluation between scientific and technological documents.Combine the above four indicators with the Page Rank algorithm to calculate the total value of scientific literature reading.The experimental results show that the TF-IDF-based method for evaluating the reading value of scientific and technological literature is effective,and the evaluation granularity is fine,and it can recommend scientific and technological literature with high reading value to users.
Keywords/Search Tags:Citation Network, PageRank Algorithm, Scientific Literature Value, TF-IDF Algorithm, Literature Ranking
PDF Full Text Request
Related items