Font Size: a A A

Literature Clustering And Evolution Of Topic Innovation Based On Weighted Network

Posted on:2021-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:X X GaoFull Text:PDF
GTID:2370330620963508Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the popularization and development of Internet technology,the world has entered the era of information explosion.Computing and analyzing large-scale intensive scientific data has become a new trend of data mining.Text mining,as the main aspect of data mining,has become a new method of knowledge discovery.The traditional method of text information mining relies on large scale corpus and complete knowledge base,which increases the difficulty of text information mining.In recent years,many scholars have overcome the shortcomings of traditional text information mining methods by using complex networks to represent texts.In order to more accurately mine text information,this paper uses the literature as the text benchmark data set,based on the weighted complex network to start from two aspects of the text abstract and keywords,to study text clustering and evolution of topic innovation.Firstly,text information is mined from text abstract,and semantic similarity between texts is measured,so as to cluster texts.Considering that text abstracts are generally short and belong to the category of short texts,based on the Short Text Similarity(STSim)measurement model based on complex networks,based on co-occurrence theory,further consider the weight information of words in the abstract,and propose a new short text similarity measurement model based on weighted complex network is used to calculate the similarity of abstracts.The model first constructs a weighted short text complex network through the co-occurrence relationship and co-occurrence frequency of words,and then uses an improved node weighting algorithm to highlight the recognition degree of word co-occurrence times,thereby Calculate the weighted comprehensive feature value of each word to get the similarity of each text summary.Finally,the text summary similarity is directly transplanted to the text to cluster the text.Text clustering is just to classify texts by topic,to deeply mine thepotential text information of each type of cluster,and to grasp the text development trend in this field is also very important.Starting with the keywords of the text,the article quickly grasps the rules of innovation in the field,so as to grasp the development trend in the field.Firstly,a weighted keyword co-occurrence network(W-KCNs)was constructed based on the co-occurrence relationship between keywords and the co-occurrence frequency.Considering the weight based on the innovation coefficient defined by Huajiao Li et al.,A new measurement index was defined: weighted Innovation coefficient,used to measure the innovation degree of weighted complex networks.It also introduces the measurement indicators related to the weighted network(average weighted nearest neighbor),and combines statistical and visual analysis to deeply dig into the topological characteristics and evolution laws of the field of "artificial intelligence".The experiments show that the clustering effect of the short text similarity measurement model based on weighted complex network proposed in this paper is better than the STSim model.The clustering purity and clustering F metric values are 15.84% and 12.02% higher than the STSim model,respectively.The value of entropy decreased by16.23%.Compared with the change trend of the innovation coefficient,the weighted innovation coefficient can more accurately describe the innovation degree of the keywords in the field of artificial intelligence from 2006 to 2017 and the evolution trend of the theme over time.
Keywords/Search Tags:Weighted complex network, Short text similarity, Weighted innovation coefficient, Text clustering, Innovation evolution
PDF Full Text Request
Related items