The Short Text Fuzzy Spectral Clustering Based On Semantic

Posted on:2016-08-10

Degree:Master

Type:Thesis

Country:China

Candidate:T Y Song

Full Text:PDF

GTID:2308330461974063

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the continuous development of Internet and the emergence of Web3.0, computers and mobile communication equipment are increasingly popularizing. Meanwhile, with the increasing popularization of a large number of application software, such as Wechat, Microblog, more and more short texts are generated. Although the contents in each text are less, these texts cover the fields of politics, economy, entertainment, education and so on. So analyzing and managing these texts’ information effectively have considerable reference and practical values.The text clustering, which divides texts into several clusters based on the similarities between different documents, is an important technical support for text analysis and management. In the cluster results, documents in the same cluster have higher similarities than documents in different clusters. Text clustering mainly includes two aspects:the methods of text similarity computation and clustering algorithms. In this article, text clustering is described in detail at first and then an improved method of text similarity computation based on HowNet is suggested. At last, the results calculated by the improved method are applied in spectral clustering which is also improved furtherly to make the cluster results more accurate.For the text similarity computation based on the existing text similarity calculation method of HowNet, the regional density of sememes is considered while sememes’similarity is caculated. Then a method of dynamic concepts’ similarity computation which focuses on the relationships of first independent sememes, other independent sememes and following sememes, is proposed. In this method, the weights of each sememe are allocated dynamically. In addition, the procedure of text similarity computation is simplified by the characteristics of short text in this article.Moreover, with the application of improved text similarity computation method on spectral clustering algorithm, improving method over the shortages of spectral clustering are also suggested in this article. In order to obtain more accurate similarity matrix of data sets, a density factor is suggested to construct a new similarity matrix. Aiming at the disadvantages of handling massive amounts of data in spectral clustering, the data sets are partitioned into many sub-data sets to reduce the computational complexity of spectral clustering. In addition, a reclassification aimed to blurry data produced by partitioning data sets is made to increase the accuracy of spectral clustering in the article.Finally, contrast experiments and analyzation of the improved methods show that these improved methods suggested in the article have better results.

Keywords/Search Tags:

Short text, Text clustering, HowNet, Semantics, Spectral clustering

PDF Full Text Request

Related items

1	Research On Text Clustering Based On Hownet
2	Social Media Short Text Clustering And Its Applications
3	Clustering Algorithm Research Of Short Text Based On Semantic Similarity
4	Spectral Clustering Algorithms And Its Application In Text Clustering
5	Research On Text Spectral Clustering Algorithm Based On Hidden Topics
6	Research On Text Clustering Algorithm Based On Spectral Clustering
7	Study Of Text Clustering Algorithm Based On Semantics
8	Short Text Clustering Method Based On BTM
9	Knn Text Classification Algorithm Based On The Semantics Of The Center
10	Research On Key Technologies Of Short Text Hot Topic Detection