Font Size: a A A

The Short Text Fuzzy Spectral Clustering Based On Semantic

Posted on:2016-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:T Y SongFull Text:PDF
GTID:2308330461974063Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the continuous development of Internet and the emergence of Web3.0, computers and mobile communication equipment are increasingly popularizing. Meanwhile, with the increasing popularization of a large number of application software, such as Wechat, Microblog, more and more short texts are generated. Although the contents in each text are less, these texts cover the fields of politics, economy, entertainment, education and so on. So analyzing and managing these texts’ information effectively have considerable reference and practical values.The text clustering, which divides texts into several clusters based on the similarities between different documents, is an important technical support for text analysis and management. In the cluster results, documents in the same cluster have higher similarities than documents in different clusters. Text clustering mainly includes two aspects:the methods of text similarity computation and clustering algorithms. In this article, text clustering is described in detail at first and then an improved method of text similarity computation based on HowNet is suggested. At last, the results calculated by the improved method are applied in spectral clustering which is also improved furtherly to make the cluster results more accurate.For the text similarity computation based on the existing text similarity calculation method of HowNet, the regional density of sememes is considered while sememes’similarity is caculated. Then a method of dynamic concepts’ similarity computation which focuses on the relationships of first independent sememes, other independent sememes and following sememes, is proposed. In this method, the weights of each sememe are allocated dynamically. In addition, the procedure of text similarity computation is simplified by the characteristics of short text in this article.Moreover, with the application of improved text similarity computation method on spectral clustering algorithm, improving method over the shortages of spectral clustering are also suggested in this article. In order to obtain more accurate similarity matrix of data sets, a density factor is suggested to construct a new similarity matrix. Aiming at the disadvantages of handling massive amounts of data in spectral clustering, the data sets are partitioned into many sub-data sets to reduce the computational complexity of spectral clustering. In addition, a reclassification aimed to blurry data produced by partitioning data sets is made to increase the accuracy of spectral clustering in the article.Finally, contrast experiments and analyzation of the improved methods show that these improved methods suggested in the article have better results.
Keywords/Search Tags:Short text, Text clustering, HowNet, Semantics, Spectral clustering
PDF Full Text Request
Related items