Font Size: a A A

Research Of Concept Drifting Detection In Text Data Stream

Posted on:2018-09-26Degree:MasterType:Thesis
Country:ChinaCandidate:G ChuFull Text:PDF
GTID:2348330542492633Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the continuous development of the internet,text data stream is widely used in mainstream applications,such as Weibo,electricity providers,news media and so on,and in which the concept drifts also appeared new charateristics: such as fast,frequent and diversr.While the traditional concept drift detection methods usually based on the change of classification error rate,based on the rising phenomenon of classification error rate caused by concept drifts.However,they are difficult to apply in text data streams.Therefore,in this paper,we study the concept drift detection in text data stream:(1)Introduce the definition and classification of concept drifts,and introduce the traditional concept drift detection methods from two categories,supervised and unsupervised or semi supervised,according to the distribution characteristics of data stream.(2)To the decreasing applicability of traditional concept drift detection methods,we analyzed and summarized the incentives of concept drifts from the angle of data stream its distribution,put forward a method of drifts classification,and further put forward a corresponding three-layers concept drift detection method.The method detected drifts based on the three layers of label space,feature space and the mapping relationship between features and labels.The experiments show that the proposed method can improve the precision and accuracy of the concept drift detection,especially in text data stream.(3)To solve the lack of effective information caused by frequent drifts still exists,we introduced LDA model and proposed a SSCD concept drift detection method based semantic information,used semantic information to make up the lack of effective sample and detected by the semantic similarity of word space and topic space.The experiments show that this algorithm can effectively enhance the detection effect in text data stream,especially can significantly reduce the number of missing when drifts' frequently is high.
Keywords/Search Tags:text data stream, IV value, semantic information, concept drift, classification
PDF Full Text Request
Related items