Font Size: a A A

Imbalance Learning And Its Application In High Risk Prediction Of Prenatal Screening

Posted on:2018-12-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:S XingFull Text:PDF
GTID:1318330539485917Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
The traditional machine learning methods have good classification performances on the balance datasets.However,due to the learning mechanism based on the principle of error minimization,the results on the imbalanced datasets tend to be biased towards the majority class;the minority class is often ignored because of its small proportion.There are a lot of imbalanced data in the real world,and the analysis and learning of minority class becoming a focus of academic research.How to improve the overall classification performance and the classification performance of the minority class is the difficulty of the research.Massive data processing faces many challenges.The traditional data reduction methods recompose a consistent subset to replace the original dataset by deleting redundant samples.But the traditional sample selection methods based on distance calculation have the problems of large time consumption and high computational complexity.The data of prenatal screening is the typical structural imbalanced data.But there are few researches on the application of machine learning in prenatal screening.The traditional method of prenatal screening can calculate the risk values of pregnant women with Down syndrome fetus,Edward's syndrome fetus or open neural tube defects fetus according to the concentrations of fetal alpha fetoprotein(AFP),human chorionic gonadotropin(h CG)and free estradiol(u E3)in the serum of pregnant women,the weight of pregnant women,age and other factors.Due to prenatal screening can only judge the risk of morbidity fetal,the illness can't be completely confirmed.Pregnant women with high risk values need an amniocentesis to in order to check that her unborn baby is not affected by certain disorders.But amniocentesis can cause miscarriage.At the same time,prenatal screening high risk calculation software is designed for foreign companies.The risk calculation method is the trade secret.Besides,the traditional prenatal screening methods cause 30% missed diagnosis,that is,they cannot detect the entire sick fetus.In order to solve the problems of poor classification performance of minority class for imbalanced data,large time consumption for data compression and processing;simulation of high risk computing system for prenatal screening;improving the true positive rate of prenatal screening and reducing the missed diagnosis of traditional methods,this thesis will focus on the following 4 aspects about imbalanced data learning and its application in prenatal screening decision making.In this thesis,a sample selection method based on unstable cuts,an ELM ensemble learning method based on resampling,an improved weighted ELM method based on adjustable factor and a high risk prediction model and auxiliary decision for prenatal screening were proposed.(1)The concept of unstable cuts is introduced into the sample selection method,and an unstable cuts based sample selection is proposed(UCBSS).It is proved that the cuts which divide the sample space by the minimum value of discriminant function must be unstable cuts.For each attribute,the samples adjacent to the unstable cuts are marked.The samples containing more labeled information are reserved and compose the unstable subset.It is proved that the classifier learned on the unstable subset can class correctly the samples which have no unstable cuts information.The experimental results on artificial datasets and UCI data sets show that the method is suitable for the compression of large data sets with high imbalance ratio.Compared with the traditional CNN method,the running time and overall classification performance of the proposed method has obvious advantage.In addition,the classifier whose discriminant function is not convex function also achieves the same effect.When the discriminant function of the classifier is a convex function,the method is also suitable for the compression of large data with high noise.(2)ELM ensemble learning method based on resampling combines with resampling technique and ensemble learning method.It considers the relationship between subsets and original dataset in the process of random under sampling.Multiple ELM methods are trained on multiple subsets recomposed by random under sampling,and then determine the final result by voting.The experiments are implemented on multiple modified UCI datasets.The experimental results show that the proposed method is superior to the random under sampling method and CNN method combined with ELM method respectively in dealing with imbalanced data sets.In addition,when the imbalance raio is high,we can first use UCBSS method to reduce the imbalance raio,and then using the ELM ensemble learning method based on resampling.The experimental results show that it can not only improve the efficiency,but also maintain and improve the overall classification performance.(3)The sample weights of weighted ELM method are set according to the number of majority class and minority class respectively.But these weights only consider the number of theirs own class,and are fixed artificially.The more optimal weight may be missed.The experimental results also show that the other weights can get better results. This thesis proposes a weighted ELM method based on adjustable factors(WELMAF).The weight setting takes into account the relationship between different classes.The proposed method use 1 as the initial weight of the majority class and the ratio of the number of samples of the majority class and the minority class as the initial weight of the minority class.It uses 2 kinds of schemes to add the adjustment factor to the weight of minority class or majority class respectively,and the weight of the other class remains unchanged.Because the weights adjusted by the two schemes are different,the range and the step size of adjustable factors are also different.The experimental results show that the proposed method gets a better classification performance than the original weighted ELM.(4)Firstly,we preprocess the data of prenatal screening,removing the redundant features and the noise samples.5 predictive models were used to simulate the prenatal screening high risk computing system,i.e.decision tree,ELM,ELM ensemble learning method based on resampling,weighted ELM,improved weighted ELM combine with UCBSS respectively.The experimental results show that the prediction performances of UCBSS method combined with decision tree model are the best,which are comparable to the results obtained by the present high risk computing system,especially the prediction accuracies of trisomy 18 syndrome and open neural tube defects are close to 100%.Another experiment was carried out on the simulation diagnosis data of Down's screening.A prenatal screening decision model which UCBSS method combined with the improved weighted ELM was developed to identify all cases of Down syndrome.It reduces the rate of missed diagnosis and obtains the false positive rate in an acceptable range.So it can avoid the birth of more sick fetuses.
Keywords/Search Tags:Imbalance Learning, Sample Selection, Cost Sensitive Learning, Resampling, Extreme Learning Machine, Prenatal Screening, High Risk Prediction, Assistant Decision-making
PDF Full Text Request
Related items