Font Size: a A A

Study On Feature Selection Methods Of Dimensional Small Sample Data Of Material Basis Of Chinese Medicine

Posted on:2022-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q LiFull Text:PDF
GTID:2491306521497234Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
To investigate the material basis of Shenfu injection in the treatment of cardiogenic shock,experimental data were obtained by HPLC.The experimental data of the material basis of Shenfu injection in the treatment of cardiogenic shock have high dimensions and few samples,which are typical high-dimensional small sample data.There are problems such as dimensionality disaster,over-fitting of high-dimensional small sample,irrelevant feature and redundant feature content.Therefore,in order to effectively solve the problem of high-dimensional small samples,accurately filter irrelevant features and redundant features,obtain effective feature subsets,and improve the accuracy and stability of the model,the following researches are carried out in this paper:(1)A new two-stage hybrid feature selection algorithm(TS-HFSA)was proposed to solve the problem of dimensionality disaster overfitting and other high-dimensional small samples caused by many irrelevant and redundant features in the material basis experimental data of Shenfu injection in the treatment of cardiogenic shock.The first stage of this method is to eliminate irrelevant,using Filter combined with Wrapper to filter irrelevant features;the second stage of de-redundancy,for the approximate Markov blanket deletion of redundant features,it is easy to lose information and feature subsets may contain redundancy,A de-redundancy algorithm of fusing approximate markov blanket with L1 regular term(DA~2MBL1)is proposed to delete redundant features.The experimental results show that Ts-HFSA is extremely large and good at eliminating irrelevant and redundant features,and the feature subset is small in scale,high in quality,and stable.It can be used as a research method for the qualitative basis of Chinese medicine and provide technical support for the research on the material basis of Chinese medicine.(2)Aiming at the fact that the black widow optimization algorithm cannot be directly used for feature selection,five optimization strategies are proposed:binary strategy,“OR”strategy,population restriction strategy,rapid reproduction strategy and fitness priority strategy.The first three optimization strategies extend the black widow optimization algorithm that cannot be directly used in the discrete search space to the feature selection problem in the discrete search space,and propose the black widow optimization feature selection algorithm(BWOFS).Integrate five optimization strategies to improve the algorithm performance,and propose procreation controlled black widow optimization feature selection algorithm(PCBWOFS).Experimental results prove that the two methods have advantages in searching for the optimal feature subset,and the obtained feature subset has higher prediction accuracy,and can provide competitive and promising results.Compared with BWOFS,PCBWOFS has less calculation,less time consumption,and better performance.(3)Aiming at the problem that PCBWOFS cannot directly search the high-dimensional feature space,combined with the advantages of Ts-HFSA,a new three-stage high-dimensional small sample feature selection algorithm(Ths-HDSSFS)is proposed.The method contains three stages of hierarchical progression:removing irrelevance,removing redundancy,and searching for the optimal feature subset.Ths-HDSSFS uses Filter combined with Wrapper in the first stage to adaptively eliminate irrelevant features.In the second stage,it uses a de-redundancy algorithm of fusing approximate markov blanket with L1 regular term(DA~2MBL1,proposed in Chapter 3).For redundant features,the third stage uses the PCBWOFS method proposed in Chapter4 to search for the optimal feature subset.The experimental results show that Ths-HDSSFS once again reduces the feature subset size on the basis of Ts-HFSA and improves the accuracy of the prediction model.It is a better feature selection model based on high-dimensional small sample data of Chinese medicine quality than Ts-HFSA.At the same time,based on the feature subsets obtained by Ths-HDSSFS,the important substances in the basic experimental data of Chinese medicine qualitative experiments for the treatment of cardiogenic shock with Shenfu injection were obtained and the regression equation of exogenous substances was constructed.(4)Use Python language to design,develop and implement a multifunctional Chinese medicine data analysis system.The system integrates the three feature selection algorithms proposed in this paper,11 partial least squares optimization methods and AMB.Strive for a simple interface,simple operation,easy to learn and easy to use,to assist researchers in the field of Chinese medicine in modeling and analysis,and to provide data analysis tools.
Keywords/Search Tags:Material basis of Chinese medicine, High dimensional small sample, Feature selection, approximate Markov blanket, black widow optimization algorithm
PDF Full Text Request
Related items