Font Size: a A A

Research On Cost-Sensitive Classification Of Data Streams And Its Application

Posted on:2015-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:N GuoFull Text:PDF
GTID:2298330467962146Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, especially the popularity of Internet technology, data is generated every day showing explosive growth, which are typical of the data stream:high speed, large amount and real-time. Today, the data stream has appeared in various fields, such as banking user credit rating, medical diagnostics, network intrusion detection. Therefore, lots of researchers focus on mining data stream. On the other hand, cost-sensitive classification is another significant research issue, which focuses on minimizing cost of classification. In many real-world application scenarios, cost-sensitive classification model is more reasonable. However, in the scenario of data stream, the existing cost-sensitive classifications can no longer be applied, due to the need for multiple scan dataset.Therefore, in this paper, we will pay attention to the cross-field of stream classification and cost-sensitive classification. Based on Folk Theorem and GDT algorithm, CsGDT algorithm is proposed to deal with the cost-sensitive classification problem of data streams. On the other hand, the existing algorithms for cost-sensitive classification can achieve excellent performance in the metric of misclassification costs, but always lead to obvious reduction of accuracy, which restrains the practical application greatly. Furthermore, in order to solve this problem we present an improved folk theorem. Following the idea of the new theorem, the soft-CsGDT algorithm, which can take both accuracy and cost into account, is proposed to construct a cost-sensitive classification model of data streams. With both synthetic and real-world datasets, the experimental results show that our CsGDT algorithm can minimize the misclassification cost, and compared with the cost-sensitive algorithm, the accuracy in soft-CsGDT is significantly improved, while the misclassification costs are approximately the same.
Keywords/Search Tags:data mining, data stream, cost sensitive, classifier, Gaussian decision tree
PDF Full Text Request
Related items