Font Size: a A A

Research On Association Classification Algorithm And Its Application In Medical Data

Posted on:2019-09-15Degree:MasterType:Thesis
Country:ChinaCandidate:S J GuoFull Text:PDF
GTID:2394330548967873Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,as hospitals have begun to pay attention to the medical information construction and the state thinks highly of universal health insurance,the amount of people who came to the hospital gradually increased,and more and more medical data will be generated.In addition,some hospitals have purchased a lot of large-scale high-tech medical equipments which will produce huge amounts of medical data while used widely.Based on massive medical data,this thesis tries to mine the potential and valuable information from the data and to understand the risk factors of a disease through the method of data mining to prevent or diagnose disease in advance and so as to reduce the incidence rate.At present,domestic and foreign researchers have made some progress in the research of medical data mining.However,the current research focuses on the improvement and the use of traditional classification algorithms such as random forest,neural network,SVM,etc.Although the accuracy of classification is high,it cannot find some of the features that affect the occurrence of the disease.The association classification algorithm can mine the features related to a certain disease,and it is one of the important research directions in the field of data mining.The expert system is interested in extracting the if-then rules that can provide results interpretation in medical applications.In order to mine knowledge effectively from the data,various rule induction algorithms are proposed,which can be combined with classification methods to form a rule-based classification algorithm.However,most rule-based classification algorithms cannot directly handle the numerical data.Discretization data preprocessing can convert numeric data into a classification format.The existing discretization algorithms do not take into account the distribution of the numerical variables in the dataset,which may reduce the performance of rule-based classifiers.Aiming at the problem that the existing discretization algorithm cannot maintain the distribution of the original data,the thesis proposes a discretization algorithm based on Gaussian Mixture Model(DAGMM),which preserves the most frequent patterns of raw data by taking into account the distribution of numerical variables.The effectiveness of the DAGMM is verified by using four publicly available medical datasets.The experimental results show the DAGMM is superior to the other six static discretization methods in terms of the number of generated rules and the classification accuracy of the associated classification algorithm.Therefore,the DAGMM is used to the clinical expert system,it can improve the performance of rule-based classifiers.As many scholars use association classification techniques to help physicians accurately to predict breast cancer diseases and the use of association rules can strengthen the classification process.The DAGMM with common association classification algorithms is applied to the prediction of breast cancer diseases.However,most of the associated classification algorithms are affected by the estimation method used in the rule evaluation process and the priority technology used at the attribute level.In this thesis,a feature weighted association classification algorithm based on statistical harmonic mean(FWAC)is proposed.More accurate association rules are generated by pruning with statistical measurement technique.The FWAC is compared with five well-known association classification algorithms on two breast cancer datasets from the UCI machine learning database.The experimental results show the FWAC is superior to other AC algorithms in this case study.In addition,more accurate rules are generated by the FWAC.The research results of this thesis has a good effect on the prevention and diagnosis of breast cancer.
Keywords/Search Tags:Discretization Algorithm, Medical Data, Association Classification, Statistical Harmonic Mean, Gaussian Mixture Model
PDF Full Text Request
Related items