Font Size: a A A

Literature Topic Extracting Based On Weighted Semantic And Citation Relation

Posted on:2016-08-03Degree:MasterType:Thesis
Country:ChinaCandidate:C Y YangFull Text:PDF
GTID:2308330470984860Subject:Information Science
Abstract/Summary:PDF Full Text Request
Along with the coming of big data, more and more literature is shown as electronic documents in the Internet, in which the electronic scientific literature documents not only spread the knowledge, but also promote the development of scientific research. However, facing the vast amounts of e-science literature, it becomes a pressing need that scholars want to use limited time and energy to understand the topic content of each scientific literature and quickly find the information which they are interested. Topic extraction is one of the methods to solve this problem. The traditional methods of topic extraction can extract theme by term frequency and location. The methods do not consider the semantic relationship between words, such as synonyms and polysemy. So, in recent years, some topic extraction about topic model、ontology and knowledge database which take into account the semantic relationship between words emerging in endlessly, and greatly improving the quality and reliability of the topic. Therefore, this article analyzes the development of topic extraction on the basis of semantic and proposes a method based on weighted semantic and citation relation, hoping that this method can provide some idea for the field of topic extraction.The literature topic extracting based on weighted semantic and citation relation is mainly on the basis of Labeled-LDA model、citations content and K-means algorithm. The citation content can represent a relationship between citing documents and cited documents and also can shows the theme of the citing documents’ content to a certain extent. So in this paper, we use the Labeled-LDA to get citations-topics probability distribution, and then, handle this part of the data, we will obtain the documents-topics probability distribution. Finally, we use the K-means algorithm to cluster documents, and extract the topic of each type of documents. The test data are downloaded by PubMed, the results show that the method can extract the theme of documents to a certain extent.
Keywords/Search Tags:Labeled-LDA, citation, K-means, topic extraction, TF-IDF, Topic Mode
PDF Full Text Request
Related items