Font Size: a A A

The Research Of Data Stream Classification Based On Rough Set Theory-neural Network Integration

Posted on:2015-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:C M YanFull Text:PDF
GTID:2268330422970027Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer, communication and network technology, manyinformation systems have produced a large number of data streams in the process of operation.For example, telecommunication data, stock data, internet communication data, the searchengine data and so on. Data stream, as a new type of data, is real-time, fast, continuous arrivaland the grand scale of orderly data sequence.The classification of data streams is after a scanning of data streams, we propose aclassification model or function, and the data objects are mapped to given categories by usingthe model. The main difficulties on data streams’ classification are that data streams contain alot of redundant attributes, and too much attributes influence the model’s construction speedand classification precision; at the same time, data streams arrive continuously, so theclassification model must update efficiently with the rapid influx of data and can correctlyreflect the classification information of the current data. The particularity of data streamsdetermines the methods of data streams’ classification must be different from the traditionaldata mining methods, and the classification method and technology have a very wideapplication prospect in different fields, therefore, the stable, fast and accurate data streams’classification method has great values in theory and application.In this paper, I combine the rough set theory and neural network method to deal with thecharacteristics of high dimensions, a large quantity of data in the data streams. The rough sethas a strong capacity of processing uncertainty and incomplete information, and can obtainthe correlation between the data, thus reduce the number of attributes only by the data itself.The neural network has a strong ability of nonlinear mapping, so its accuracy is better thanother data mining methods in dealing with nonlinear model. It is suitable for processing largedata sets, especially, it has good fault tolerance, adaptability and the anti noise interference.By the two methods, the number of nodes of the neural network input can be effectivelyreduced, the complex structure of the network can be greatly simplified, and the classificationaccuracy of the model can be heavily improved.In addition, I adopt the sliding window technique to deal with the fast data streams. Thedata streams are divided into a plurality of data blocks with the same size. A data block trainsan individual classifier, and many individual classifiers form an ensemble classifier. Thegeneralization error of the model can be effectively reduced by using the integrated method. Particularly, the training speed of an individual classifier is generally higher than the updatingspeed of a single model, so the ensemble method is more suitable for processing high speeddata streams.According to the rough set theory, neural network and the ensemble learning, I propose anew classification method of data streams. A contrast experiment on the actual data obtains agood classification result, so the new classification method is proved to be feasible andeffective.
Keywords/Search Tags:rough set, attribute reduction, neural network, ensemble learning, data stream
PDF Full Text Request
Related items