Font Size: a A A

Feature Selection Of Heterogeneous Data Based On Particle Swarm Optimization

Posted on:2015-06-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y HuFull Text:PDF
GTID:2298330422487074Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Feature selection is an important data processing method in the field of datamining and pattern classification. It is widely used in fault prediction, diseasediagnosis, network intrusion detection and biological emotion recognition, and manyother areas. The quality of different features by sampling datum often differ widelyresulted from the influence, such as external environment and measuring equipmentprecision, in many practical problems, such problem is known as heterogeneous datafeature selection problem. Because this problem not only has the features of manydimensions, but also different features of a sampling datum have different quality,therefore, traditional feature selection methods for undifferentiated data are difficult touse. In view of this, this thesis studies particle swarm optimization theory andmethods for heterogeneous data feature selection problem.First, considering the feature selection problem with completely reliable data,this thesis puts forward a knowledge guide particle swarm optimization featureselection method. In this method, a particle is encoded with a binary string, and amethod of calculating the fitness of a particle is given. The whole particle swarm isdivieded into a superior particle swarm and an inferior particle swarm according to thefitness of a particle. A strategy of classifying features is designed on the basis of theprobability of selecting a feature. According to the type of a feature, an improvedSigmod function is presented, and then the updating probability of a feature isdetermined. To verify the performance of the proposed method, it is applied to tentypical test data sets of UCI database, and compared with the other three methods.The experimental results show the superiority of the proposed method. In addition,these four methods are also applied to hepatitis b clinical diagnostic data of adomestic hepatitis, and the experimental results demonstrate that the proposed methodcan obtain satisfactory classification results.Then, in view of the situation that data are not fully trusted, but the credibilitycan be precise expression, this thesis puts forward a feature selection method based onmulti-objective particle swarm optimization. In this method, the credibility(reliability)of each feature in the data set is expressed with a precise number between [0,1], andthe whole reliability of feature subset is evaluated by the average credibility ofselected features. Therefore, the problem can be formulated as a bi-objectiveoptimization problem, which contains both the classification accuracy and thereliability. To solve the above problem, a multi-objective particle swarm optimization algorithm is designed. To improve the search performance of an algorithm and thedistribution of Pareto solution set, a method of generating the global guide ispresented based on grid partitioning and Gaussian sampling. To improve theexploration performance of a particle swarm, a disturbance strategy which helpsparticles to jump out of local optima is proposed. The proposed method is applied to6typical test data sets of UCI database, and compared with the other four methods. Theexperimental results show the superiority of the proposed method.Finally, in view of the situation that data are not fully trusted, and its credibility isfuzzy, this thesis proposes a feature selection method based on multi-objective particleswarm optimization. In this method, the credibility(reliability) of each feature in thedata set is expressed with a fuzzy number with triangular membership. Accordingly,the reliability index of a feature subset is turned into a fuzzy number. When tacklingthe above bi-objective feature selection problem which includes a fuzzy objective,first, a probability dominance relation is defined to compare different solutions. Then,an effective strategy of updating the external achive is given according to theprobability dominance relation and the tolerance coefficient set by the decision-maker.The proposed method is applied to4typical test data sets of UCI database, andcompared with the other two methods. The experimental results demonstrate itssuperiority.
Keywords/Search Tags:feature selection, particle swarm optimization, heterogeneous data, Gaussian sampling, probability dominance
PDF Full Text Request
Related items