Font Size: a A A

Active Learning Algorithm And Its Application In The Diagnosis Of Cardiovascular Disease

Posted on:2011-09-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y P YangFull Text:PDF
GTID:1118330362955284Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
There are volumes of unlabeled medical data generated from medical diagnosis everyday. If only depending on the limited labeled data, the medical decision and support system can hardly have a good performance of generalization. Therefore, we propose to make use unlabeled samples or attributes by implementing active learning into the medical decision and support system, which will in turn reduce the dependency on the labeled data.In the real world like medical diagnosis, however, there are many challenges for active learning. (1) Most of selection strategies are based on the decision boundary. They either choose the uncertain samples with the close-to-boundary method, or select the certain samples with far-from-boundary method, or combination of both methods. But, none of them has the consideration of the real distribution of whole dataset. So, the final solution may be local optimum; (2) Current active feature selection algorithms focus on the single goal of decreasing the classification error. However, the real application should consider many other costs, like misclassification cost and sampling cost. (3) To address imbalanced data, many active learning algorithms try to get equal samples near decision-boundary, which will bring further decrease of total labeled samples for training learner.To overcome the issue of local optimum, the hierarchical clustering algorithm is used to discover the dataset's structure step by step in a top-down manner. Also, a new method is proposed to adaptively look for decision-boundary with consideration of both exploring and exploiting strategies.Since there are various kinds of cost in the real world tasks, both misclassification cost and sampling cost are taken into account in proposed cost-sensitive active feature selection algorithm. In the forecasting-error-algorithm based incremental sampling, the uncertainty of feature is evaluated according to the maximum expected changes, with goal to have a good balance between classification accuracy and total cost. Further,the selection of features for labeling is based on feature interactivity to keep out the unrelated features.For imbalanced data, the accuracy of classification will be affected by the complexity of concept and the size of training set. To address it, the random subspace sampling is used to reduce the complexity of concept, and the artificial data are created to bring more training samples. Further, to have good performance on minority class, more weight is assigned to its misclassification cost.Our proposed active learning algorithms are also implemented into the diagnosis of cardiovascular disease. Used in experiment is a real hypertension data collected by Hubei Provincial Center for Disease Control and Prevention in China. Also, the disease data from UCI machine learning repository are used in experiment.In experiments, the hierarchical clustering algorithm shows the capability of quickly locating the decision boundary. Also, the active feature selection algorithm can not only find the related features, but also bring higher classification accuracy. Facing the unbalanced medical data, our method also shows higher forecast accuracy.
Keywords/Search Tags:Computer Aided Diagnosis, Active Learning, Sampling Bias, Feature Selection, Imbalanced Dataset
PDF Full Text Request
Related items