Stroke is an acute cerebrovascular disease that seriously affects people’s physical and mental health.Only through early screening and active intervention,early detection and early treatment,can its harm be reduced to the minimum.Epidemiological investigations,Transcranial Doppler(TCD)and Electroencephalogram(EEG)are all effective methods for clinical screening of stroke,but currently,the analysis and discrimination of screening data are mainly performed manually,which is easily affected by clinical experience of medical staff and subjective factors.Therefore,the use of artificial intelligence technology to assist the diagnosis and treatment of stroke has become a current research hotspot.Since the screening data of stroke often have unbalanced characteristics,which will lead to the failure of traditional machine learning.Therefore,for the unbalanced data of stroke,constructing a classification model with excellent performance has important research significance and social value.Rotation Forest is a classic ensemble learning algorithm,by introducing Principal Component Analysis(PCA)for feature mapping,the accuracy of the base classifier is guaranteed and its diversity is improved,thereby improve the performance of ensemble learning.This paper is oriented to three different modalities of stroke unbalanced data,studies the Rotation Forest algorithm model for stroke data from the algorithm level and the data preprocessing level respectively.The detailed research work is as follows:Feature engineering was performed on three different stroke screening datasets to prepare data for subsequent model classification.According to the epidemiological data,combined with the relevant knowledge of stroke risk factors,data cleaning was carried out.According to the TCD data,new combination features were constructed according to the existing blood flow feature,and the information of TCD data was deeply mined.In view of EEG data,EEG signals are weak,nonlinear and unsteady,so common EEG features were extracted for the input of the classification model.Construct a Cost-sensitive Rotation Forest model from the algorithm level.In real life,the classification and diagnosis of stroke has the characteristics of example-dependent cost-sensitive,because the cost of misclassification not only differs from class to class,but also differs between different instances of the same class.Firstly,the stroke cost matrix was designed according to the characteristics of stroke disease,and cost factors were introduced into the classification model.Then,an Example-dependent Cost-sensitive Rotation Forest based on PCA(ECSROF_PCA)was constructed using the Example-dependent Cost-sensitive Decision Tree as the base classifier.The analysis results show that ECSROF_PCA can save more cost than the existing algorithm,which verifies the effectiveness of the Rotation Forest algorithm.Construct a Rotation Balanced Forest model from the data preprocessing level.Compared with the cost-sensitive model,the data resampling method does not consider the cost difference of misclassification,so it is simpler,easier to understand,and more versatile.Firstly,according to the comparative experiment,random subsampling was selected as the data resampling method,which was used to construct the classification model in combination with the Rotation Forest.Then,considering that the traditional Banlance Rotation Forest(Ban_Ro F)balances the data by subsampling before feature mapping,it is not conducive for the model to make full use of the data information to find a more appropriate mapping space.Rotation Banlance Forest(Ro BF)is proposed,which is mapped first and then sampled.The analysis results show that Ro BF has further improved classification performance compared to Ban_Ro F,and the test time is shorter and the real-time performance is better.In order to further improve the performance of the Rotation Forest model,the supervised Linear Discriminant Analysis(LDA)is introduced to replace the original unsupervised PCA to construct the Rotation Forest model.Both the cost-sensitive model and the rotation balanced forest model verify the effectiveness of the LDA mapping method.The experimental results show that the PCA-based Rotation Forest has a large gap between Recall and Specificity,and the LDA-based Rotation Forest can better balance the two indicators.Then,based on the three dimensions of feature space reconstructed by PCA and LDA and original feature space,the Combined Rotation Forest model was proposed to further increase the diversity of base classifiers.The effectiveness of the proposed approach was verified on both cost sensitive model and Rotation Balanced Forest model.The experimental results show that the Combined Rotation Forest model proposed in this paper can control the complexity of the model and further improve the classification performance of the model. |