Research And Application Of Short Text Clustering Based On Word Representations

Posted on:2018-06-08

Degree:Master

Type:Thesis

Country:China

Candidate:D Huang

Full Text:PDF

GTID:2348330536960945

Subject:Computer application technology

Abstract/Summary:

With the rapid development of mobile Internet,WeChat,weibo,email,forums,live platform,reviews websites and other applications have become increasingly popular,and information generated by these platforms is in the form of short text.These short text information has a high research value,through the in-depth analysis of short text,can mine hidden information and potential value in short text.Text clustering is a machine learning method that can explore the interconnections between the specified text data.Through the clustering analysis of the information data in the form of short text,Mining and extracting the relationship between short texts is the basis for other short text mining,such as user portraits,personality recommendations,community discovery and other popular research topics.Traditional short text clustering methods have many problems like high data dimension or lack of semantics.This dissertation proposes a short text representation model based on word representations.By moving the distance between short text feature words to measure the similarity of short text,and on this basis complete short text clustering.Experimental results show that compared with the method of short text clustering based on vector space model and document topic model,this method for text clustering is effective on many data sets.There are some problems in traditional paper searching based on keyword matching,such as lack of information and recommendation deviation.In this dissertation,we take the title of papers text as the object of study,by training word vectors of the paper title with different dimensions,and find out the rich elements of information in the paper elements to enrich the semantics of the thesis title.We introduce density peaks find clustering method,and define the text area of the paper that is smaller than the truncation as the area of the paper’s similarity for automatically clustering of paper titles.Comparing with the state-of-the-art methods,the proposed method achieves high improvement in precision,recall and F-measure,which shows the contribution of this dissertation.

Keywords/Search Tags:

EMD Distance, Word Vector, Peak Density Discovery, Clustering

Related items

1	Research On Several Improved Density Peak Clustering Algorithms And Their Applications
2	Optimization Research Based On Density Peak Clustering Algorithm
3	Research And Application Of Clustering Algorithm Based On Density Peak
4	Research On Density Peak Clustering Algorithm Based On Adaptive Reachable Distance
5	Research And Application Of Financial Big Data Based On Density Peak Clustering Of K Near Neighbors
6	Community Discovery Of Complex Networks Based On Fuzzy Density Peak Clustering Algorithm
7	Manifold Density Peak Clustering Algorithm And Its Application Of Weibo Text Classification
8	Research On The Grid Density Peak Clustering Algorithm
9	Outlier Detection Algorithm Based On Entropy Weight Distance And Density Peak Clustering
10	Density Peak Clustering Study Based On Bayesian And Statistical Strategies