Font Size: a A A

Study On Association Rules And Their Application In The Analysis Of The Liver Cancer Patients Data

Posted on:2006-10-19Degree:MasterType:Thesis
Country:ChinaCandidate:J H WuFull Text:PDF
GTID:2144360155950763Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
The common characters of some clinical databases are: (1) they have a great deal of cases and more variables and more information,(2)they inevitably contain some inaccurate information in the course of data collecting. (3) missing values are often inevitable (4) there are some complex relations among the variables of data, so it is difficult to require the variables to be normal and the covariates to be independent. Therefore, it is difficult to deal with this kind of data by using traditional statistics methods.Discovery of association rules is an interesting pattern of data mining, and its purpose is to find some association patterns which come from some variables or their combination in the data. An association rule is an expression of the form A B, where A and B are sets of items. The intuitive meaning of such a rule is that data of the database which contains A tends to contain B. Obtaining the rules is based on the frequency of items or their combination in the data, and the result is easily understood. Introducing association rules into the analysis of medical data, making up statistics methods, and getting rich information possibly are the main aim of the study.However there are many different measures evaluating the association rules in the literatures, and there is rare comparative study of these measures. On the basis of study on association rules theory and realization method, simulation study on theoretical data regarding how to obtain some interesting association rules and application study on factual data have been done. Results of simulation study indicated: (1)the pruning of multi-items rules based on the improvement of the confidence is very necessary, and the increased times of confidence is considered with 0.05. (2)not all measures are appropriate to analyze the medical data for the association rules, some of them were misguiding. (3) Fisher's exact test is a good measure regardless of the different number of cases, and the measures lift et al. are only suitable for the large database, but its bound value is specified carefully. In the end, integrating the study result and some literatures, we summarized the analysis procedure regarding association rules for medical data.Results of application study indicated: (3)using Fisher's exact test as the measure of...
Keywords/Search Tags:data mining, association rules, liver neoplasm, interestingness measure
PDF Full Text Request
Related items