Font Size: a A A

Research Hotspot Identification Based On LDA2vec Model Under Multisource Data

Posted on:2020-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:H L QiuFull Text:PDF
GTID:2428330575952590Subject:Library and Information Science
Abstract/Summary:PDF Full Text Request
Information overload is a major issue worthy of attention in the current Internet information era.It is especially important to extract and extract key information from massive information.The scientific literature as the main carrier of scientific and technological innovation knowledge has grown exponentially,and has many characteristics such as multi-source distribution and diverse description formats Different types of documents such as paper patents,conference reports,and government publications may provide the same subj ect.Description of different angles.Therefore,in scientific research,identifying mining scientific research hotspots from different sources of scientific literature is of guiding significance for carrying out the next scientific research work.Therefore,the purpose of this research is to quickly and accurately identify the hot topics contained in the texts of multiple data sources through the proposed model method,and provide information support services for scientific and technological innovation decisions.This research firstly uses the literature research method to analyze the research hotspots and scientific research themes,and conducts research on the research methods of the main methods and thematic models of domestic and international research hotspots,and summarizes and reviews the representative research results.This paper combs the five methods of expert method,citation analysis method,knowledge unit analysis method,map analysis method and text mining method in the current scientific research hotspot identification analysis,and the theoretical exploration of the topic model and its application in scientific research hotspot identification.The research status is summarized.Then based on the model research method,this paper proposes a method based on LDA2vec model for multi-source text research hotspot identification and builds a model for scientific research hotspot identification.This method combines the advantages of LDA topic model to implicit semantic mining and Word2Vec The advantage of the word vector model for grasping contextual relationships.At the same time,in order to verify the effectiveness of the method,using the experimental analysis method,statistical analysis method,etc.,taking the scientific literature in the field of machine learning as an example,the title and summary data of the journal papers and patent documents are obtained for fusion as experimental data sources.The model's perplexity and topic coherence are used to compare the topic extraction effects of LDA2vec and LDA in the context of multi-source text.On the other hand,the method of this study is based on multiple data sources and single data sources.Under the environment,the theme extraction effect is observed and compared.After experiments,the results show that the method proposed in this paper is feasible and can be improved to some extent in the face of multi-source data.The method can relatively quickly and accurately identify the hot content in the multi-data source text,make up for the shortcoming of the single analysis data source for subject detection,and enrich the practical application of the multi-data source fusion theory system.
Keywords/Search Tags:Topic model, LDA2vec, Research Hotspot, LDA, word2vec, multisource data fusion
PDF Full Text Request
Related items