Font Size: a A A

Research On Chinese Text Clustering Algorithm Based On Semantic Cluster

Posted on:2020-06-30Degree:MasterType:Thesis
Country:ChinaCandidate:X J SunFull Text:PDF
GTID:2428330623965347Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In text information processing,semantic representation of text is the core of text retrieval and text clustering.Text clustering is the main method of text information processing,which can help people to discover statistical rules in data.Chinese text clustering analysis is an important part of text clustering analysis.In the clustering analysis of Chinese text,the expression of text vector is not accurate due to the influence of semantic,grammatical and contextual factors.However,the vector space model commonly used at present,when representing words in the text,the word vectors represented are independent and will ignore the semantic correlation between words and documents,so that the accuracy of text clustering cannot be guaranteed.When Word2 vec text representation method is applied,although considering the semantic relationship of context,the text vectors represented in different documents are different,which brings limitations to text clustering,and the clustering effect is not good.According to the above problem,this paper proposes a new method of building based on semantic clusters of text vector,through the study of the level of the collocation of the extraction of key vector clustering,using the vector of the universal principle and semantic relevance,to obtain the semantic cluster.Then,the text vector is spatially transformed to calculate the similarity between collocation vector and its semantic cluster center,and the semantic information of document feature words is obtained,which is embedded into the document feature word vector,and the text vector constructed after spatial transformation is used for text clustering.With the traditional text representation methods to contrast experiment and Word2 vec text representation method,test results show that the method can effectively improve the key vector approximation degree of text semantic,and compared with contrast method of text clustering results have higher accuracy and recall rate.The paper has 20 pictures,8 tables,and 60 references.
Keywords/Search Tags:Text clustering, feature words, collocation vector, semantic embedding, semantic cluster
PDF Full Text Request
Related items