Font Size: a A A

Feature Selection And Ensemble Pruning Based On Binary Artificial Fish Swarm Algorithm And Their Applications Research

Posted on:2019-07-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:X H ZhuFull Text:PDF
GTID:1368330548485883Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Data preprocessing and data mining are key links of achieving the valuable knowledge and information extracted from complex data,Feature Selection is an important technique in data preprocessing,and ensemble learning is a key technique in data mining.Feature Selection cannot only retain the key features and reduce redundant attributes,but keep the main value of original data,which can avoid the curse of dimensionality,and improve the efficiency of data processing.Ensemble learning is a paradigm of machine learning,which aggregates multiple classifiers using a certain strategy,then a better ensemble performance in classification than a single classifier can be attained.Ensemble learning is widely applied in many fields.The number of classifiers in the ensemble and the data size of classification tasks are increased rapidly,which bring heavy computational overheads.To alleviate this issue,many scholars proposed ensemble pruning technique,and the goal of achieving a better ensemble performance was achieved.Feature Selection mainly involves four aspects:a evaluation metric of a subset,a searching strategy,a verification process,and a stopping condition.For the selection problem of a evaluation metric of a subset,the essential features in an original dataset are achieved using a effective evaluation metric,which maintains the maximal similarity between a selected subset and an original dataset,and shows the effectiveness of feature reduction and the efficiency of data processing.For the selection problem of a searching strategy,it has to include some properties such as simplicity of implementation,strong robustness,good and fast global convergence and so on.Based on the above analysis,fractal theory is used as a evaluation metric of a subset,binary artificial fish swarm algorithm(BAFSA)is taken as a searching strategy,and the combination of fractal theory and BAFSA is applied in solving feature selection probelms.To address some drawbacks of BAFSA,a collection of improvements are introduced into the algorithm,which is applied in solving feature selection problems combining with fractal theory.Ensemble pruning aims at selecting a optimal subset of classifiers extracted from a ensemble system,and the remarkable improvement in classification and significant reduction on computational overheads are achieved.Existing ensemble pruning approaches always find the optimal sub-ensemble using diversity measures and heuristic algorithms separately.Those pruning approaches based on diversity measures,using different strategies,cannot exactly find the optimal sub-ensemble extracted from a ensemble system.Those approaches based on heuristic algorithms cannot exhaustively search for the optimal sub-ensemble from such enormous number of candidate sub-ensemble.To alleviate the above issues,a combination of diversity measures and heuristic algorithms is proposed.It uses diversity measures to pre-prune a fraction of classifiers who performs badly,which can markedly reduce the computational complexity of an ensemble pruning problem.The final ensemble is achieved from the retaining classifiers after pre-pruning using a heuristic algorithm.Considering that a double-fault measure performs well when it comes to measuring the diversity of classifiers,and that BAFSA has good efficiency with respect to searching in a binary space.A combination of both is applied in ensemble pruning problem.The main research work and contributions are summarized as follows:(1)To avoid trapping to local optimum,a position updating strategy,escaping a local optimal solution and parallelism mechanism are introduced into AFSA,and swarming and preying behaviors are improved.Parallelism binary artificial fish swarm algorithm(PBAFSA)is proposed,which can increase the population diversity,and enhance the convergence speed and precision.PBAFSA combined with fractal dimension is applied in solving feature selection of haze datasets,which remains the key features,dramatically downsizes the dimensions of haze datasets,and improves the efficiency of data processing.Experimental results on the haze datasets of Beijing,Shanghai and Guangzhou demonstrate its effectiveness and feasibility.(2)The initial population in AFSA is initialized using a good-point set for a goal of achieving a relatively good initial population.It assigns a swimming speed value to each artificial fish,which conforms to the natural fish behaviors.The competitive and cooperative mechanisms among the sub-populations are introduced,which makes the population perform diversely,and improves the searching efficiency.Co-evolution binary artificial fish swarm algorithm(CBAFSA)is proposed.The proposed CBAFSA combined with multi-fractal dimension is used to dimensionality reduction of haze datasets,which can avoid the curse of dimensionality,and save massive computational resources.The haze datasets after dimension reduction can be used to haze forecast using an extreme learning machine.Experimental results on the haze datasets of Beijing,Shanghai and Guangzhou demonstrate its effectiveness and credibility.(3)The double-fault measure of each classifier in a constructed pool is calculated,and the average double-fault measure of the whole classifiers is achieved.Those classifiers whose double-fault measures are above the mean value are pre-pruned for a goal of significantly reducing the computational complexity of ensemble pruning.An improved binary artificial fish swarm algorithm(IBAFSA)is proposed by improving the moving way of artificial fish,and introducing the competitive and collaborative operations in the same population,which enhances the searching efficiency.The retaining classifiers after pre-pruning are further pruned using the proposed IBAFSA,and the final ensemble is attained.Experimental results on 16 UCI datasets demonstrate its stability and effectiveness,and it is applied in haze forecast of Beijing,Shanghai and Guangzhou.(4)The first classifiers with smaller double-fault measures are achieved by calculating the double-fault measure of each classifier in a constructed pool,which dramatically alleviates the computational overheads,and it is also a pre-pruning process.Reverse binary artificial fish swarm algorithm(RBAFSA)is proposed by improving the moving way of artificial fish,introducing reverse searching,competitive and collaborative behaviors,which remains the population diversity,and avoid trapping a local optimal solution.The retaining 25 classifiers are pruned using RBAFSA,and the pruned ensemble is achieved exactly.Experimental results on 25 UCI datasets show its effectiveness and significance,and it is applied in haze forecast of Beijing,Shanghai and Guangzhou.
Keywords/Search Tags:Binary Artificial Fish Swarm Algorithm, Feature Selection, Fractal Theory, Ensemble Pruning, Diversity Measure, Haze Forecast
PDF Full Text Request
Related items