Font Size: a A A

Literature Topic Analysis Based On Time Series Clustering

Posted on:2021-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:X L WuFull Text:PDF
GTID:2428330611961912Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Thematic analysiscan select all the features from the literature and analyze the key subject content to realize the retrieval of the literature.In other words,the analysis and grasp of the literature content is the core of the subject analysis.In order to extract the subject concept,the key words corresponding to the semantics should be selected as the retrieval index of the literature.In fact,the quality of subject index is directly affected by the quality of subject analysis,and the effectiveness of information retrieval depends on the quality of subject analysis.Therefore,in order to better conduct thematic analysis,the main research contents are as follows:(1)Aiming to solve the problems of low efficiency and high time complexity of existing methods for time series clustering,we propose a method(TCMS),which is based on matrix profile and social network techniques.First,the matrix profile—an algorithm that can quickly find one pair of the most similar subsequences derived from two time series—is used to measure correlation between time series to reduce the time complexity.The correlation between two time series is measured as the number of the most similar subsequences.Second,the proposed method constructs a network to represent the correlation between time series.The network treats each time series as a vertex,and regards the correlations between time series as edges: for more correlated time series,the edge has a greater weight.Finally,the network is divided by a community detection method.The experiments use some classical time series clustering algorithms from the field of time series data mining,such as Louvain-?NN,k-medoids,and k-shape.Experimental results demonstrate that the proposed method is a better approach to clustering time series,and the whole process of the computation is more efficient than state-of-the-art methods.(2)In view of the uniqueness of the existing methods of topic discovery and evolutionary analysis in literature,this paper proposes a method of topic discovery and evolutionary analysis based on time series clustering.Firstly,the frequency of co-occurrence of high-frequency keywords is converted into the similarity between words by using the method of Ochiia coefficient,and then the clustering algorithm of neighbor propagation is used to gather the keywords with strong correlation into a cluster to form a topic cluster.Then,according to the time sequence,the annual average heat of each topic cluster is calculated,and a "popularity sequence" can be established for each topic.Finally,the new time series clustering algorithm proposed in this study is used to cluster all sequences and the evolution analysis of clustering result clusters is carried out.(3)In addition to the research on the development trend of themes,the relationship between themes is more important.Previous studies on the relationship between topics were mostly about reclassifying topics by means of clustering or classification from a user's perspective.In order to better present this kind of relationship,through the time series clustering method proposed above,combining the practical significance,this paper measures the relationship between topics,builds the network connection graph,and then clusters topics through the community discovery algorithm,which can further understand the relationship between topics.
Keywords/Search Tags:Time series clustering, Matrix profile, Community detection, themes discovery, Topic evolution
PDF Full Text Request
Related items