Font Size: a A A

Data Mining In The Chinese Medical Applications

Posted on:2009-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y CuiFull Text:PDF
GTID:2208360245961120Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data Mining means a decision supporting process of finding out hidden patterns in a large set of facts or observed data. It is also called Knowledge Discovery. Categorization of text database is one of the important tasks of Data Mining. Electronic medical case records is also a class of text dataset. Data mining and categorization of those datasets is of much significance. In this study, 200 real Chinese medical records were collected from the affiliation hospital of Chengdu Medical College. And then, data cleansing was conducted, since the real dataset includes imperfect, noisy and discontinuous data. The next step is to quantify the data and to extract the features. In this study, we employed a novel feature extract method---- the phrase based feature extracting method. We combine the 200 Chinese medical records covering 4 different diseases into 3 different groups. In summary, the matching phrase algorithm can be stated as follows.1) Obtain the set of the matching phrases of each pair of documents in dataset2) Construct a set including all the set in step 1. Removing the repeating matching phrases.3) Represent each document by a vector.4) Using SOM to construct a classifier to the vector set.5) Visualize the result and find out feature phrases that are associated with each class.Finally, we categorize the data by the proposed method. Such algorithm obtains good performance of categorization. Data mining can automatically extract major features of different diseases from medical case records dataset. And categorizations of the dataset will assist the medical staff to diagnose diseases and to investigate the features of different diseases, which may not be discovered before. Although the work presented here is aimed at medical case records categorization, it could be easily adapted to any document type as well.
Keywords/Search Tags:data mining, neural network, Chinese medical records, text categorization, feature extract
PDF Full Text Request
Related items