Font Size: a A A

Research On Algorithm Of Cost-Sensitive Data Stream Classification Under PU Learning Scenario

Posted on:2016-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:S LiFull Text:PDF
GTID:2308330461466576Subject:Agricultural informatization
Abstract/Summary:PDF Full Text Request
With the development of computer technology and Internet, data stream has been widely used in applications such as web visiting, credit card operation and network real-time monitoring etc. Data stream classification has been studied widely and some effective algorithms were proposed. In the research of data stream classification, the existing problems are:(1) User may only care about specific class of data. For example, when users read news online, they may only interested about one particular topic, such as sport news. Therefore they may only search such related news instead of other topic news. Similarly in the situation of card fraud and medical diagnosis data analysis, users only focus on data that they interested over other type of data.(2) Most data streaming algorithms did not take cost-sensitive into account, which plays an important role in our daily life. Traditional classification methods often treat misclassification cost of different type of data in the same way and ignore the test cost when making decisions. However, it won’t be fitted in the real world properly.According to the problems, based on concept-adapting very fast decision tree algorithm, coping with PU(Positive Unlabeled) data stream, in this thesis, we propose a new evaluation for attributes splitting, and propose a cost-sensitive decision tree algorithm for data stream classification under PU learning scenario. Main works are:(1) Researching on algorithm of cost-sensitive data stream classification. We enhance CVFDT(Concept-adapting Very Fast Decision Tree) algorithm, replacing splitting attribute criterion, and using CRR(Cost-Reduction Ratio) instead of Information Gain.(2) Building classifier which can process PU data streams with concept-drifting. Based on CVFDT(Concept-adapting Very Fast Decision Tree) that deal with concept-drifting, and referring to the learning method in PU decision tree classification algorithm POSC4.5 working on static data set, we estimate the ratio of positive and unlabeled sample in data stream, 9 values were enumerated in [0.1,0.9],and train 9 decision trees to form a forest. And one tree is picked out from the forest using out using best tree selection method.(3) Converting synthetic datasets, moving hyperplane and SEA data stream under PU learning and cost sensitive learning scenario, transforming all negative samples into unlabeled ones and positive ones into unlabeled under certain percentage. Experiments were conducted which using total classification cost, the misclassification cost and testing cost to evaluate classification quality.The results show that the proposed cost-sensitive decision tree algorithm for data stream classification under PU learning scenario has an excellent classification performance.
Keywords/Search Tags:Data Stream Classification, Cost-sensitive Learning, PU Learning, Decision Tree
PDF Full Text Request
Related items