Font Size: a A A

Cost-Sensitive Feature And Instance Selection For Imbalanced Netwrok Abnormal Datasets

Posted on:2017-12-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:J BianFull Text:PDF
GTID:1318330536965715Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Network data quantity is increasing exponentially with the rapid development of communication technology and the integration of multiple heterogeneous networks.Though the proportion of abnormal events among total events is minor,it still incurs losses for citizens,businesses and national information security in a variety of factors;it even causes severe risks.Improving recognition rate for abnormal events is thus a pressing issue for network security.This dissertation probes into the class imbalanced problem and network abnormal events,which are also called minority and is used as the research object.The imbalance classification is deemed as a breakthrough since it uses cost-sensitive learning as supporting theory.Based on probability theory,chaos theory,information theory and statistics theory,this dissertation proposes a cost-sensitive feature selection using chaos genetic algorithm as the search method.Subsequently,an efficient cost-sensitive feature selection with a memetic framework is designed.Then an imbalanced dataset reduction strategy is designed by improving the synthetic minority over-sampling technology,which is applied into the stratified instance selection method.The main contributions and results of this dissertation are listed as follows:(1)A cost-sensitive feature selection using chaos genetic algorithm is presented.Focusing on class imbalanced problem of network datasets,this dissertation focuses on the cost factors on the feature selection process.It designed a cost-sensitive feature selection applied to classification of network abnormal events,referred to as CSFSG,by introducing cost-sensitive learning into feature selection.Considering both misclassification cost and test cost according to Bayes theory,it constructs a cost-sensitive fitness function based on nearest rules.Taking advantage of the characteristics of chaos system,it optimizes genetic search strategy based on an improved Tent mapping of chaos.The CSFSG algorithm could increase the recognition rate of abnormal events and reduce the total cost of feature selection and trades off both factors.The behavior of the CSFSG algorithm is tested and the experiment results show that the approach is efficient and able to effectively improve classification accuracy and to decrease classification time.(2)An efficient cost-sensitive feature selection with memetic framework is put forward.In view of the resource-constrained environment,it needs analysis algorithms with low cost and high efficiency.Traditional memetic framework-based feature selection method was improved by introducing Bayesian theory to construct the cost matrix,and a hybrid cost-sensitive feature selection method(CFSM)was proposed,which attempts to reduce overall misclassification costs and improve classification performance.A genetic algorithm was used for a global search,and a misclassification cost factor was introduced to the total cost function to construct the fitness function.By using the approximate Markov blanket method,an information correlation coefficient was used as the evaluation index to fine-tune the feature selection by adding relevant features and removing redundant or irrelevant features,which speeds up the convergence towards the optimal subset.In the experiment,a k-nearest neighbor classifier was used to compare our method with currently existing ones.The results show that the algorithm we proposed performed better.Compared with a traditional GA-based memetic feature selection algorithm and other cost-sensitive feature selection algorithms,our method can effectively identify minority classes by using fewer features at lower cost and produces higher classification accuracy.Our method is also more suitable for classifying imbalanced datasets.(3)A double direction stratified instance selection strategy based on synthetic minority over-sampling technology is given.While the imbalanced network datasets meets a scaling-up problem,it may be caused by low recognition rate,and even produce serious errors.Based on the classical stratified theory,this dissertation proposes a double direction stratified instance selection by improving the “Instance selection in majority classes” method.While it synthesizes the minority classes over sampling,the strategy selects instances in majority classes as well to rebalance the datasets.The improved synthetic minority over-sampling technology,referred to as iSMOTE,improved from the view of attribute and constructs random number expression with the uniform distribution theorem.Experiment results show that the new strategy could increase the instance numbers of minority classes and improve classification accuracy.In addition,we can see that it is more effective with minority classes with fewer numbers and datasets with larger sizes.
Keywords/Search Tags:network abnormal datasets, class imbalance problem, cost-sensitive learning, feature selection, instance selection
PDF Full Text Request
Related items