Font Size: a A A

Research On Microblog Text Processing And Topic Analysis Methods

Posted on:2018-09-14Degree:MasterType:Thesis
Country:ChinaCandidate:X L DuanFull Text:PDF
GTID:2358330536958554Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Microblog has become a topic publishing platform today.There are a lot of topics in Microblog everyday,most of these topics have obvious marking,others that are not marked are called hidden topics.Microblog data is big,which contains a lot of information,including business,the latest news and so on.Since some of the information in the microblogging text is not directly visible,special processing is required to mine the information.Based on this,we will study the basic processing methods of Microblog text,detect and track the topics.Although the hidden topic are not marked directly in the Microblog data,but its impact on social media is very important.How to find and track these hidden topics,has become an important part of Microblog research,it is significant to analyze and guide the public opinion of the social media.Therefore,it is very necessary to carry out the research on the processing methods of Microblog text and analysing topics.The main contents of this paper include the following aspects:(1)Research on Building Microblog DatabaseThis paper puts forward the Microblog topic content crawling technology based on Microblog keyword search,get the Microblog topic folder including the different Microblog content and comment under the topic.And build the Microblog Database,including the database of Microblog messages,relation,user information and topics.In the end,we finish the user dictionary with 80 W words,and apply it to word segmentation.(2)The Feature Extension of Cluster Analysis Based on MicroblogBy comparing the methods of the expansion strategy for short text based on HowNet and Cilin,we proposes using Word2 vec to train the corpus of microblog,and constructs a related vocabulary words of the microblog context,then use the seed words and microblog label information to expand microblog text,and puts forward the methods of extracting microblog text keywords and distinguishing the similar words and related words.Finally,the experiment showed that by using the Word2 vec to extend microblog is better,and the effect of cluster analysis for microblog text has been significantly improved about 10%.(3)The research of text similarity calculation Based on Sentence Vector About Chinese microblogsThe similarity computing on Microblog text between words and sentences is important for data mining with Microblog.This paper is based on Word2 vec and the feature of short text,we proposed three algorithms to solve the problems: 1.Use Word2 vec to expand text,and use TF-IDF to obtain sentence vector.2.Adding the word vectors become sentence vector.3.Build a word bank to obtain high dimension vector space for the sentence.Then we compare the three methods,and get the best method to compare with the method that Le and Mikolov proposed.Finally,we find our method is better.(4)Research on Topic Detection and Tracking Technology Based on Combinatorial Clustering Algorithm for MicrobloggingIn this paper,the clustering algorithm based on hierarchical clustering and K-Means clustering is used to model the microblogging topic,to improve the existing K-Means clustering algorithm that can not predict the number of categories,and get a better microblogging topic results set.On the basis of text clustering,this paper puts forward the dynamic tracking of microblogging topic by using the time variation and granularity variable of topic.Using the characteristic that the influence of topic changes with time,to get the time-frequency modeling,And to determine whether the results of the topic tracking task is an abnormal topic,if the exception occurs,will expand the topic of search granularity,to get the subject of the development of new tracking topic set.(5)The design and implementation of microblogging topic detection and tracking systemUsing JAVA+SPARK+HIVE programming to achieve a microblogging topic detection and tracking system.Here we introduce the overall framework of the system and the realization of the function of each sub-module.And based on the system,we have done a lot of experiment.
Keywords/Search Tags:short text of Microblog, feature expansion, text representation, similarity computing, topic analysis
PDF Full Text Request
Related items