Font Size: a A A

Research For The Application Of Feature Selection On TCM Data Mining

Posted on:2009-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:B Y LiFull Text:PDF
GTID:2178360242989307Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
In the field of Apoplexy, "syndrome" and "four diagnostic methods" are two common terminologies. Syndrome refers to the pathological generalization of a group of closely related symptoms on a given stage in the course of disease development; the four diagnostic methods refer to inspection, auscultation and olfaction, inquiry and palpation. There are several hundred items in the four diagnostic methods, such as cough, twitch and headache and so on. They are the presuppositions of correct differentiation and effective treatment in Apoplexy. The syndrome differentiation is a process where doctors analyze and summarize the relevant information gathered from the four diagnostic methods (inspection, auscultation and olfaction, interrogation, and pulse feeling and palpation), including all symptoms and signs, make judgment of the properties of the syndromes based on this information, and probe into the essence of the disease. The diagnostic criteria of Apoplexy used by Traditional Chinese Medicine (TCM) are enacted in the middle of 1990's on the base of the experiences of experts. However, over the past decade many clinical experiments show that the diagnostic criteria have some limitations. Therefore, the researchers want to apply Data Mining (DM) methods to analyze the gathered clinical cases, so as to obtain more objective and accurate diagnostic criteria based on mathematical methods.This thesis is based on the national key fundamental researches development plan project ("973 Project") "The Study on the Evaluation Criteria of the Diagnosis and Therapeutic Effects with the integration of Disease and Syndrome of Ischemic Stroke". The project uses the clinical cases data gathered by management information system which is developed at early stage of the project. This thesis applies the method of Feature Selection to establish a new syndrome differentiation Diagnosis Criteria of Apoplexy.In essence, the syndrome differentiation Diagnosis Criteria of Apoplexy is to select four diagnostic methods items for each syndrome of Apoplexy, this procedure is similar to the feature selection's procedure, so this thesis selects the Feature Selection algorithm. As important research field of Pattern Recognition and Machine Learning, Feature Selection has great application in dimensionality reduction of high-dimensional data and dealing with massive datasets. Using Feature Selection algorithm in the mass data processing, on the one hand it may improve the accuracy and efficiency of Classification and Clustering; on the other hand it can find out information-rich feature subsets and decrease data redundancy.Because of the Medicine Data Mining's particularity, the Feature Selection algorithms need some improvements. This thesis designs 4 main algorithms on the base of Feature Selection algorithm Framework. There are several fundamental algorithms are involved, such as Genetic Algorithm, Association analysis, KNN Algorithm. These four main algorithms are: (1)Genetic Algorithm combines with Association analysis to feature selection;(2)Adaptive boosting algorithm based on Genetic Algorithm;(3) Genetic Algorithm combines with KNN Algorithm to feature selection;(4)Feature Selection based on Feature Weight. The relationship of these algorithms is hierarchical, and the algorithm's efficiency and effects are compared, and then some improvements are made. At last satisfied algorithm efficiency and effect is obtained.The improvement of KNN Algorithm is one of this thesis's innovation points. Through analysis toward the algorithmic complexity of KNN Algorithm, we find out the key factors, and then propose an improved algorithm on the base of data object. After the improvement not only the time complexity is decreased but also the effect of the algorithm is improved.
Keywords/Search Tags:Apoplexy, Syndrome, Four Diagnostic Methods, Feature Selection, Genetic Algorithm, Association analysis, KNN Algorithm
PDF Full Text Request
Related items