| Rapeseed is an indispensable part of China’s edible vegetable oil and livelihood economy,which indirectly determines the happiness index of people’s lives.Due to the cancellation of the temporary collection and storage policy for rapeseed in 2015,the enthusiasm of farmers for planting has greatly decreased,resulting in a year-on-year decrease in the planting area and total annual oil production of rapeseed in China.In today’s rapidly developing computer field,we can improve the low oil production of rapeseed through technological means.Research has shown that there is a significant correlation between the oil production of rapeseed and the thousand kernel weight,mean length and width,and mean roundness of rapeseed seeds.In order to obtain higher thousand kernel weight of rapeseed seeds,this article decided to start from gene loci on the chromosome of Brassica napus seeds to predict the thousand kernel weight value of mature rapeseed seeds.Therefore,in the breeding stage,rapeseed seeds with higher thousand kernel weight values at maturity were artificially selected for sowing,The ultimate goal is to achieve a higher oil production rate of rapeseed while maintaining the same planting area.This experiment focuses on the following aspects:(1)The step of data preprocessing is crucial for whether the neural network can successfully start training,and the selection of its method also plays a crucial role in the training process and results of the neural network.Due to the chaotic nature of the original gene data obtained in this experiment,the author needs to preprocess the gene data according to the actual needs of the experiment,including missing value processing and numerical processing.(2)Due to the fact that genetic data contains too much biological information,its original data dimensions are very large.In order to avoid the common dimension disasters,poor generalization ability,overfitting and other problems of neural networks,the author compared the dimensionality reduction methods based on feature reconstruction and feature selection,and selected a more suitable dimensionality reduction method based on the experimental results and actual needs,and then constructed a dimensionality reduction network.As the key experimental part of this article,the author also optimized the model used in the dimensionality reduction network as follows: firstly,the author proposed a hybrid feature selection method to replace the original feature selection method;The second point of optimization is to use multiple evaluation models in the dimensionality reduction network to evaluate the feature subsets,and then optimize the obtained multiple feature subsets again according to the feature selection rules defined by the author.The experimental results show that the optimization scheme proposed by the author not only saves computational resources,but also effectively reduces time costs,making it feasible and effective.(3)The optimal feature subset selected above was predicted using a fusion model composed of gradient boosting decision tree(GBDT),adaptive boosting algorithm(Ada Boost),extreme gradient boosting tree(XGBoost),and lightweight gradient boosting tree(Light GBM)for thousand grain weight.The results showed that the optimized prediction accuracy reached a maximum of 94.5%,which increased by6.73% compared to before optimization.At the same time,the optimal feature subset selected according to the feature selection rules defined by the author includes 6 gene loci(numbered Bn A07-p14390667,Bn A03-p7366783,Bn A09-p19583295,Bn scaff-22728_1-p1258526,Bn scaff-22728_1-p1258510,and Bn scaff-22728_1-p1018084),which have been confirmed to have a high correlation with the thousand grain weight of rapeseed by relatively mature genetic locus methods in biology. |