Font Size: a A A

Research On The Method And Exploitation Of Traditional Chinese Micro-blogging Short Topic

Posted on:2018-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:K F YangFull Text:PDF
GTID:2348330512477230Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of Internet and intelligent mobile devices,Twitter,micro-blogging,etc.as the representative of the social media applications become more and more popular,short text interactive increasingly common,micro-blogging data on the analysis of the theme,It is of great practical significance to obtain the hot topic of people's attention and to find the user to find their own needs in the information generated by the mass.Micro-blogging text content is short,the feature is sparse and large scale,for micro-blogging this has a special feature of the short text,select the effective method of subject identification,fine-grained theme search,the greatest degree of meet the needs of users is currently need to be resolved The important question.This paper mainly focuses on the extraction of information in short text,and focuses on the implicit subject extraction of Chinese micro-blogging short text.On the basis of the existing research on text clustering and subject model,this paper has carried on the related research to Chinese short micro-blogging text corpus.The main research work and achievements include:(1)The short-text clustering based on the top-k frequent closed-word set is carried out on the per-processed micro-blogging corpus,and the frequent word set mining algorithm is improved in the clustering process,which avoids multiple attempts of min_support and frequent word set data A huge amount of the problem,And the frequent word set as a description of the cluster of information,get micro-blogging text coarse-grained classification.(2)Aiming at the problem of feature spars in the cluster,which is not clear and the short text exists,a potential subject mining method is proposed based on the LDA model and the BTM model.The word pairs of each document are Modeling,improve the performance of short text subject features,get the fine-grained class implicit theme.(3)According to the idea of short-text clustering and fine-grained cluster mining,we design a micro-blogging implicit topic mining system,which can not only get the classification of micro-blogging short text,Class cluster to further tap the purpose of the subject,And finally to achieve the micro-blogging platform micro-blogging information implicit theme of the automatic extraction and classification of storage.
Keywords/Search Tags:Micro-blogging Short Text, Text Clustering, Subject Mining, Frequent Closed Item Sets, Word of Co-occurrence
PDF Full Text Request
Related items