Font Size: a A A

The Research On Feature Selection For Data Stream

Posted on:2012-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:W S ChenFull Text:PDF
GTID:2218330368992450Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of information technology, especially network technology, moreand more applications need to deal with real-time data increasing by millions Gbit per day,such as network monitoring, spam classification, sensor network and so on. The data streamgreatly affects the effciency of learning algorithms when processed directly, because it isoften high-dimensional and contains many irrelevant and redundant features. Feature selec-tion can eliminate these features so as to improve the effciency of mining algorithms andaccuracy of the performance. However, some of the traditional feature reduction algorithmsare diffcult to apply to the high-dimensional data stream. Thus, it becomes more importantto explore suitable data structure on data stream and methods of feature correlation.Firstly, this paper discusses the technigue of feature selection. In considering real-time, unlimetedness and concept drift of data stream, a new feature selection algorithm fordata stream based on fitting FSCFFR is proposed. The new algorithm can make up fordeficiencies in measure of feature correlation and apply to real-time data stream. What'smore, the new algorithm can eliminate redundant features effectively, which can improvethe effciency of learning algorithms.Secondly, the normal process is restricted by the maximum processing speed of sin-gle processor, but the parallel processing can increase the maximum processing speed bymultiprocessor connection. This paper does some research on the parallel measure in orderto improve the velocity and effciency of process. After that two parellel algorithms withdifferent communications are porposed using manager and worker model in the MPI envi-ronment. Experiments show that parallel computation can improve the effciency and thespeed of feature selection.Finally, in order to validate the performance of the feature selection algorithm in realapplication, this paper applies the whole method on the network intrusion detection to anal-ysis and deal with the data on network intrusion detection system online. It is feasible andpracticable from using this real application to test and verify the effciency.In summary, the work on feature selection for data stream has certain practical signif-icance. On one hand, it eliminates redundant features in data stream, which improves theperformance of learning algorithm and the effciency of data mining. On the other hand, experimental system provides some reference for relative researches.
Keywords/Search Tags:Data Stream, Synopsis Data Structure, Feature Selection, Parallel Computing
PDF Full Text Request
Related items