Font Size: a A A

Classification Algorithm For Social Text Stream

Posted on:2018-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:T Q ZongFull Text:PDF
GTID:2348330542490979Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of network and mobile client technology,people produce,share and transmit data all the time.While much of the data exists in the form of text,it is meaningful to analyze and extract valuable information from it.How to extracted useful information and pattern from the text data efficiently has become a problem in the application of data mining.Due to the change of text data itself and the continuous development of the text semantic analysis,the topic model which represented by the Latent Dirichlet Allocation has gradually become one of the most widely used algorithm race in the field of text analysis.The social text stream is a kind of informally short text data set which is used for communication,meanwhile depending on the time.Its characteristic is: the relatively free language environment,lack of standardization in words,much more noise information;short length of each text with large amount of its data;highly depending on the context of language environment;with additional information such as time,authors,etc.The result is not satisfactory if we directly apply the topic model to the text.According to the research of the LDA model and its extension model and the characteristics of social text stream,this paper aims to solve two major problems of LDA model and its extension model during social text stream modeling: lack of fitness when traditional LDA model applied in the analysis of short text directly;lack of ability to capture and interpret the topic change of the dynamic text.The paper starts with the characteristics that LDA model is lack of the adaptability to the short text.The reason is:The word-concurrence information focused by the LDA model is often restricted to the single text and therefore the learning effect of the LDA model is not ideal confront to the short text lacking of length and content.Text restructuring model is proposed in this paper,it provides the word co-occurrence information to the LDA model,in order to overcome the problem of text information insufficiency.Secondly,the social text stream highly depend on the context of language environment,the text topic change will often vary with time.This paper put forward the Time sequence labeled Latent Dirichlet Allocation(TL-LDA).This model capture mutual influence from the topics of the social text stream through label switching matrix in order to improve the ability to capture topics' change.Through using prior probability distribution to simulate the change of time,we receive richer meaning of posterior information.To improve the ability of explain the text topic we introduce the supervised-learning skill while modeling.The conclusion is arrived through the contrast analysis of the experimental results: compare with other traditional model,the restructuring text model and the TL-LDA model put forward in this paper adapt to the study and analysis of the social text stream better.
Keywords/Search Tags:Social text stream, Text classification, Latent Dirichlet Allocation, Labeled Latent Dirichlet Allocation
PDF Full Text Request
Related items