| With the continuous progress of computer technology and the in-depth research and application of big data,various industries are more unmanned and refined.This trend requires more advanced artificial intelligence methods to replace the cumbersome or delicate manual labor in the past.As the constraints of the problem to be solved are more complex and the data dimension is higher,the big data processing technology is required to be faster and more accurate.Machine learning is currently the main technology for processing big data.The efficiency of its processing model is largely based on feature engineering of datasets.Feature engineering mainly includes feature extraction and feature selection.This paper mainly studies the latter.Faced with the data feature dimension of N,the complexity of feature selection is O(2~N).Intelligent algorithm is a feasible solution to realize feature selection.In this paper,genetic algorithm is used to realize intelligent feature selection.Genetic algorithm has been applied to many fields by of its own advantages.However,the genetic algorithm still has the disadvantages such as premature convergence and slow calculation speed when facing a large amount of data.When dealing with the feature selection problem,the genetic algorithm must rely on the data set to determine the fitness function,and this process is time-consuming.The genetic algorithm must be improved in order to improve its convergence speed.To this end,we conduct the following research:1.In order to analyze big data and obtain feature subsets with high precision,this paper uses the coding method in intelligent algorithm to assign weights to features.2.A genetic algorithm based on matrix structure is proposed.This algorithm makes up the defect exposed when traditional genetic algorithm facing function optimization problem.Firstly,the parental elite population is constructed by scanning the species matrix row by row and moving the element representing the best individual to the main diagonal position of the species matrix.Then,two individuals are randomly selected in the parent elite population and generate two offspring by the crossover operator.These offspring will be placed in the symmetrical position of the species matrix.Finally,mutation operation is performed on all individuals in the population.If the fitness of the mutant individual is smaller than the fitness of the original individual,the original individual is retained.If the fitness of the mutant individual is greater than the fitness of the original individual,the mutation individual is retained.After several iterations of the above three operations,we can obtain the optimal individual in the matrix as the optimal solution to the problem.The algorithm proposed in this paper is verified on several optimization problems.The algorithm has fast convergence speed and strong global convergence performance.The improved method of this algorithm can be extended to other evolutionary algorithms.3.A parallel KNN classifier based on multi-core CPU is proposed.The purpose of parallel design is to speed up the calculation.The KNN classifier will be used as the fitness function of the improved genetic algorithm,and the classification accuracy of the classifier on big data will be used to evaluate feature subsets.By designing a multi-population matrix genetic algorithm,the speed of convergence is improved;By using the parallel KNN classifier based on multi-core CPU,the calculation speed of the fitness function for high-dimensional data is improved.making the genetic algorithm optimization of machine learning a feasible scheme,which has guiding significance for further expanding the research of evolutionary machine learning.The above improvements prove that it is feasible to optimize machine learning using genetic algorithm,which has guiding significance for further research on machine learning. |