Font Size: a A A

Research On Topic Identification And Evolution Analysis Of Subject Literature Based On Multivariate Data Fusion

Posted on:2024-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y R SunFull Text:PDF
GTID:2568307103473024Subject:Management Science and Engineering
Abstract/Summary:
At present,with the development of science and technology,scientific research results have entered a period of rapid growth.In the face of a large amount of literature,how to accurately identify the subject of the literature and effectively carry out the evolution analysis of the research topic is one of the important research contents in the current scientific research management.Under the background of big data era,literature resources present the characteristics of digitization,and multivariate data about literature can be stored electronically and retrieved on the network.Aiming at such problems as how to improve the accuracy of automatic topic identification of scientific literature,how to reasonably analyze the development trend of subject literature topics,research hotspots and the relationship between topics,this paper uses text mining and other technologies to study multivariate data of literature.The main research process of this paper is as follows:(1)Construct a topic recognition model based on multivariate data fusion of literature.The article based collaborative filtering algorithm is improved,and the topic representation vector of the literature at the keyword level is obtained by using the Word2 vec model,so as to represent the distribution characteristics of the core content of the literature.Combined with the topic probability distribution vector of the LDA model on the abstract text,the multi-layer similarity matrix is constructed.Fusion graph was obtained by SGF algorithm and topic categories were determined.The model makes comprehensive consideration from various structural data of literature,reduces the interference of "noise",and improves the semantic quality of the subject content of literature in the subject area.(2)Propose an evolutionary analysis method based on literature topic representation vector.In the analysis of the topic content,the thesis evaluates the development stage of the topic by using Price index and negative index model,calculates the topic representation vector on the topic dimension with the score of proximity centrality as the weight,and expresses the distance relationship between the topics through multidimensional scaling analysis.The TF-IDF method is improved,subject importance index is constructed and subject words are obtained.In terms of topic evolution analysis,the index of topic growth value is constructed based on topic representation vector,and the Kleinberg probabilistic machine model is combined to find the research hot spots within topics,and the reference strength index is constructed to study the knowledge transfer content between topics.This evolutionary analysis method can comprehensively analyze the research status of subject topics,effectively detect the research hotspot and the correlation information among the literature topics.In this paper,the subject content of the literature in the subject area is studied,and the literature data set is collected to verify the created document topic recognition model and evolutionary analysis method,to prove its effectiveness.This study can provide research ideas and methods for the analysis of relevant scientific literature.
Keywords/Search Tags:subject literature, topic identification, evolutionary analysis, multivariate data fusion, text mining
Related items