Font Size: a A A

Research And Application Of Preprocess Technology In Health Data

Posted on:2018-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:T T YouFull Text:PDF
GTID:2348330512483263Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,human society has entered the information revolution era on the basis of creation and mining.Information technology is gradually applied in telecommunications,finance,education,e-commerce,and even the government decision-making field,etc.In the tide of national construction of medical information,the Big Data technology applied to medical and health field which is closely related to the livelihood of the people becomes a hot spot in social development.Due to the characteristics of medical data,such as mass,high dimension,non-standard,preprocessing of health data is an indispensable step before data mining.Data preprocessing can not only improve the quality of the data mining,but also can improve the mining efficiency to some extent.Combining with the existing technology,firstly we analyze the key technology in data preprocessing and make some technical improvement,and then apply them into two practical medical health data sets.In this thesis,the main contents are:1.The research and improvement of preprocessing technology in the "population death" dataset.We analyze the characteristics of "population death" dataset and then carry out the suitable preprocessing methods.In this thesis,we study random forest emphatically and then use it to fill the missing value of "death" attribute in dataset.In the process of filling the missing value by random forest,the imbalance of data set has a huge impact on the result,therefore we use one of oversampling technology--SMOTE to improve this dataset.Besides,we put forward innovative improvement aiming at the existing defects of SMOTE algorithm.Through experiment,it shows that missing value filling effect is better by using the improved SMOTE before random forest.2.The research and improvement of preprocessing technology in the "epileptic eeg" dataset.We analyze related preprocessing methods of eeg,and emphatically study the locally linear embedding algorithm for dimension reduction of frequency domain signals.For the defect of selecting neighborhood in this algorithm,we put forward an adaptive selection method based on K-Means and mean in the step of neighborhood selection.By experimental comparison and analysis,it shows that the improved locally linear embedding algorithm is better and easy to popularize.3.The design and implementation of data preprocessing in two health datasets.Combined with their own characteristics,we apply proper preprocessing methods and make some improvements into "population death" dataset and "epileptic eeg" dataset for the reasonable and effective datasets in the next step of data mining.Approved by the experiment results,it shows that we have a better mining quality and efficiency after data preprocessing.
Keywords/Search Tags:data mining, medical information, preprocess technology, random forest, locally linear embedding
PDF Full Text Request
Related items