Font Size: a A A

Parallel Feature Selection Method Based On An Improved Fruit Fly Optimization Algorithm

Posted on:2019-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:B FangFull Text:PDF
GTID:2348330569488381Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of internet,data storage and computer technology,the high dimension large scale data to be processed,which exists in diverse fields,is increasing rapidly.As an effective method of data processing,feature selection has received extensive attention and become a hot research topic in the fields of machine learning,pattern recognition and data mining.Feature selection is aimed at reducing the dimension of the feature space,and further reducing the cost of data storage and data processing by removing the irrelevant,redundant or noise features and searching the most effective subset with the best value of evaluation function from original data set.Rough set based feature selection,namely attribute reduction,is one of the core contents of rough set theory.The main idea of attribute reduction is to achieve knowledge reduction and derive the decision or classification rules of the problems being studied by eliminating duplicate,redundant and unrelated attributes on the premise of preserving the basic knowledge and keeping the classification ability of the decision-making system unchanged.The research results show that rough set attribute reduction is excellent in processing data sets with lower dimension and smaller data amount,but it is insufficient for massive,high dimensional and complex data processing.The capability of data mining algorithms or the computing power of high-performance computing tools has been unable to meet the needs of data processing in the face of data which are more and more complicated and more and more difficult to compute.Researchers have tried to improve the data processing capability of optimization algorithms by combining them with all kinds of computing tools.As a new kind of swarm intelligence optimization algorithm,fruit fly optimization algorithm has attracted a lot of attentions and researches from researchers.Compared with other swarm intelligence algorithms,fruit fly optimization algorithm has the advantages of simpler operation,less parameters and faster convergence speed,but it also has the shortcoming of premature convergence and easy to fall into the local optimal.Therefore,a novel feature selection method based on an improved fruit fly optimization algorithm and rough set theory is proposed.In addition,the parallel feature selection method based on Spark parallel computing framework is also proposed.The major work and creativity of this paper are listed as follows:1.In order to overcome the problem of premature convergence caused by trapping into local optimum,which resulted form the fact that all the individuals are only attracted by the best one in the standard fruit fly optimization algorithm,a novel double strategies evolutionary fruit fly optimization algorithm(DSEFOA)is proposed.The whole group was divided into elite subgroup and ordinary subgroup dynamically based on a proposed new group partitioning strategy.Then an improved searching method with chaotic variable was used in the elite subgroup to improve the individual's local searching capability.Meanwhile,an improved standard FOA-based random searching method with weighting factors was used in the ordinary subgroup to enhance its global searching capability,as well accelerate the convergence.The searching capability of both superior and inferior individuals could be effectively improved in DSEFOA by using different strategies on different evolutionary levels of these individuals.Simulation experiments on benchmark functions show that the algorithm has good efficiency and stability.2.A new feature selection strategy based on the theory of rough set and fruit fly optimization algorithm is proposed.The novel double strategies evolutionary fruit fly optimization algorithm(DSEFOA)is used to search feature subset and execute the iterative optimization.Specially,the selected feature subset is evaluated by the fitness function constructed by attribute dependency and attribute importance, which aimed at searching important features as many as possible in feature space and further selecting effective feature subset with the most contribution to the decision.Experimental results on UCI datasets show that the proposed feature selection method can effectively search the feature subset with the minimum information loss and achieve high classification accuracy.3.A new Spark-based parallel feature selection strategy using theory of rough set and fruit fly optimization algorithm is proposed.The fruit fly population is constructed as resilient distributed datasets(RDD)utilizing Spark's memory based computing and distributed characteristics,and then the corresponding transformation operators are being calculated to achieve the parallelization of searching process of best feature subset.The experimental results on UCI public data set and artificial data set show the feasibility of processing massive large scale data of the proposed parallel feature selection method in the big data environment.
Keywords/Search Tags:Big Data, Feature Selection, Fruit Fly Optimization Algorithm, Attribute Dependency, Attribute Importance, Spark
PDF Full Text Request
Related items