Font Size: a A A

Research On Multi-label Classification And Its Application In Traditional Chinese Medicine For Parkinson

Posted on:2016-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:M FangFull Text:PDF
GTID:2284330461956815Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Parkinson’s disease is a chronic, degenerative disease of the central nervous system, which commonly occurs in the elderly. Traditional Chinese Medicine(TCM) diagnoses Parkinson’s disease based on syndrome differentiation and this disease has five Chinese Syndromes(CSs). To collect and analyze Parkinson data, Chinese Scale for Parkinson(CSP) has been proposed. CSP can help doctors standardize diagnostic procedure, which including the clinical symptoms associated with Parkinson’s disease. Doctors need to sign the symptoms in CSP when diagnosing patients. However, doctors still can not reach a consensus for the relationship between CSs and CSP, but base on their personal experience.In this paper, we try to apply Multi-label classification technology to build a model according to doctors’experience, which we hope to improve TCM for Parkinson. The idea of this paper is treating CSP as attribute and CSs as labels, the relationship between CSs and CSP can be learned by multi-label algorithm.Early, TCM thinks each Parkinson patients has two CSs at the same time. With the development of the field, TCM treats CSs as principal symptom and secondary syndrome. So, Parkinson data can be divided into two parts:Parkinson dataset and Parkinson dataset-updated.1) For Parkinson dataset, We propose a new algorithm ETCC(EnTropy Classifier Chains) based on Classifier Chains model. This algorithm can optimize the order chain on global perspective and discuss the relationship among CSs. According to the principle of attribute selection, ETCC considers that higher is one label’s contribution, higher is the rank in the order chain. ETCC can get a matrix using information entropy theory in which each element represents the contribution between two labels. What’s more, PageRank algorithm change the local contribution into global. Finally, we decide the order chain based on the value of global contribution and build the ETCC model.2) In order to keep the complete information of Parkinson dataset-updated which including principal symptom and secondary syndrome, we split five CSs into ten labels, as principal CSs and secondary CSs. However, the result of prediction is not ideal, because the number of secondary CSs is too few to build models on them. To solve this problem, we propose a novel multi-label algorithm DEML to deal with imbalance based on combining labels. Firstly, DEML defines the standard for multi-label dataset imbalance and develops a method for determining imbalance. Secondly, DEML use a randomized strategy to build even subsets, which means each subset has the roughly same number of classes after binary encoding. Finally, DEML will build Label Powerset models for each subset and get the result by ensemble this models.Extensive experiments show that ETCC and DEML possess highly competitive performance on Parkinson dataset and public dataset both in computation and effectiveness.
Keywords/Search Tags:multi-label classification, multi-label dataset imbalance
PDF Full Text Request
Related items