Research Of Microblogs Clustering Analysis Based On Text Presentation

Posted on:2021-02-14

Degree:Master

Type:Thesis

Country:China

Candidate:P Xu

Full Text:PDF

GTID:2428330611451421

Subject:Software engineering

Abstract/Summary:

Text clustering analysis is an important research issue in the text processing branch of data mining field.Unsupervised clustering method can identify potential topic categories from social media,and explore unknown and valuable areas,and ensure the text processing task efficiently in the massive data.It is also widely applied in practical problems such as event extraction,personas and community detection and popular among scholars and engineers.Text representation is crucial to text clustering analysis.Among them,Vector Space Model(VSM)is the most commonly representation model in the text clustering task.However,it exist semantic isolation and sparse features in VSM,making it difficult to accurately measure the correlation between texts.In recent years,some scholars have also measured text similarity based on representation learning,but it still faces insufficient accuracy in unsupervised clustering tasks.Aiming at the above problems and the microblogs clustering task,this paper proposes two improved methods: First,we employ TF-IDF algorithm and external sentiment dictionary to produce vector space representation and sentiment identification.Then improving correlation measure between texts based on Word Embedding model for easing the isolation and sparsity of features in the clustering.Second,for the text clustering task,CIRN,a sentence representation model has been proposed,which is based on the advanced self-supervised representation learning model for learning the text semantic similarity.By learning a kind of more general distributed representation,we can measure the correlation between texts more precisely.In this paper,the two proposed methods draw on the Word Embedding representation and Input-Response model respectively.In order to evaluate the results of the improved text representation methods in the clustering task,the experiments are carried out on the Sina Weibo and Twitter datasets with human annotation.The experiment results show that the improved text representation vector has a better performance on microblogs clustering task,and achieved good scores in both purity and normalized mutual information.

Keywords/Search Tags:

Text Clustering, Presentation Learning, Social Network

Related items

1	Social Media Short Text Clustering And Its Applications
2	Research And Implementation Of Short Text Clustering In Social Network
3	Text Presentation And Construction Of Social Media Corporate Social Responsibility
4	Research And Implementation Of Text Classification For Online Social Platform
5	Social Performance And Self-presentation In Virtual Communitie
6	Research On Text Clustering Algorithm Based On Deep Learning Feature Extraction
7	Technologies Research On Text Analysis In Online Social Networks
8	Self-presentation And Privacy Dilemmas In Social Networks
9	Life Presentation And Social Performance
10	A Study For Classifying Short Text In Social Network