Font Size: a A A

Research On Classification Of Multi-Label Data Streams

Posted on:2011-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:W QuFull Text:PDF
GTID:2178360305474311Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Traditional data stream classification research mainly focus on single label data streams, however, multi-label data streams are common in the real world, and traditional single-label data stream classification can not cope with multi-label data streams. This paper proposes the concept of multi-label data stream and the algorithm of mining multi-label data streams. In this paper, it has four features, (1)examples are with multiple labels, (2)there are dependency among labels, (3)huge volume, (4)concept drift in the stream, that is the target concept is not static.In this paper, we discuss the multi-label classification in two cases, (1) there are no dependency among labels. In this case, Binary Relevance classifiers are used as base classifiers, in the experiment section, static weighted voting method are compared with static majority voting method, the results show static weighted voting method performs better than static majority voting method. (2) There is dependency among labels. In this case, we proposed an improved Binary Relevance algorithm for exploiting the dependency among labels, the base classifiers are trained by this algorithm. Since different testing examples are associated with different classification difficulties, we adopt dynamic ensemble method, in this method, we find the neighbors of the test example, and then the weights of base classifiers are derived by the performance of base classifiers on the neighbors. In the experiment section, we compare this algorithm with the one using Binary Relevance method, and the one using static ensemble method, the results show this algorithm performance better than the comparing algorithms.In this paper, two methods are proposed for generating multi-label data streams. (1)The method of generate synthetic multi-label data streams without dependency among labels, in this method, multiple moving hyperplanes are used for generate multiple labels. (2)The method of generate synthetic multi-label data streams with dependency among labels, we generate example with dependency by designing the relationship among feature spaces. In this paper, we also propose the method of simulating a user's interest to real labels of the real data sets. The main contribution of this paper as following:(1) We propose the algorithm of mining multi-label data streams with Binary Relevance method when there is no dependency among labels.(2) We improve the Binary Relevance method by exploiting the dependency among labels, and use dynamic ensemble method combines the base classifiers.(3) We propose a method to generate synthetic multi-label data streams with concept drift, and we also propose a way to introduce dependency among labels.
Keywords/Search Tags:data mining, multi-label, data stream, Binary Relevance, concept drift, classification
PDF Full Text Request
Related items