Font Size: a A A

Research And Application Of Short Text Clustering Based On Word Representations

Posted on:2018-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:D HuangFull Text:PDF
GTID:2348330536960945Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of mobile Internet,WeChat,weibo,email,forums,live platform,reviews websites and other applications have become increasingly popular,and information generated by these platforms is in the form of short text.These short text information has a high research value,through the in-depth analysis of short text,can mine hidden information and potential value in short text.Text clustering is a machine learning method that can explore the interconnections between the specified text data.Through the clustering analysis of the information data in the form of short text,Mining and extracting the relationship between short texts is the basis for other short text mining,such as user portraits,personality recommendations,community discovery and other popular research topics.Traditional short text clustering methods have many problems like high data dimension or lack of semantics.This dissertation proposes a short text representation model based on word representations.By moving the distance between short text feature words to measure the similarity of short text,and on this basis complete short text clustering.Experimental results show that compared with the method of short text clustering based on vector space model and document topic model,this method for text clustering is effective on many data sets.There are some problems in traditional paper searching based on keyword matching,such as lack of information and recommendation deviation.In this dissertation,we take the title of papers text as the object of study,by training word vectors of the paper title with different dimensions,and find out the rich elements of information in the paper elements to enrich the semantics of the thesis title.We introduce density peaks find clustering method,and define the text area of the paper that is smaller than the truncation as the area of the paper's similarity for automatically clustering of paper titles.Comparing with the state-of-the-art methods,the proposed method achieves high improvement in precision,recall and F-measure,which shows the contribution of this dissertation.
Keywords/Search Tags:EMD Distance, Word Vector, Peak Density Discovery, Clustering
PDF Full Text Request
Related items