Research And Application Of Preprocess Technology In Health Data

Posted on:2018-08-30

Degree:Master

Type:Thesis

Country:China

Candidate:T T You

Full Text:PDF

GTID:2348330512483263

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of information technology,human society has entered the information revolution era on the basis of creation and mining.Information technology is gradually applied in telecommunications,finance,education,e-commerce,and even the government decision-making field,etc.In the tide of national construction of medical information,the Big Data technology applied to medical and health field which is closely related to the livelihood of the people becomes a hot spot in social development.Due to the characteristics of medical data,such as mass,high dimension,non-standard,preprocessing of health data is an indispensable step before data mining.Data preprocessing can not only improve the quality of the data mining,but also can improve the mining efficiency to some extent.Combining with the existing technology,firstly we analyze the key technology in data preprocessing and make some technical improvement,and then apply them into two practical medical health data sets.In this thesis,the main contents are:1.The research and improvement of preprocessing technology in the "population death" dataset.We analyze the characteristics of "population death" dataset and then carry out the suitable preprocessing methods.In this thesis,we study random forest emphatically and then use it to fill the missing value of "death" attribute in dataset.In the process of filling the missing value by random forest,the imbalance of data set has a huge impact on the result,therefore we use one of oversampling technology--SMOTE to improve this dataset.Besides,we put forward innovative improvement aiming at the existing defects of SMOTE algorithm.Through experiment,it shows that missing value filling effect is better by using the improved SMOTE before random forest.2.The research and improvement of preprocessing technology in the "epileptic eeg" dataset.We analyze related preprocessing methods of eeg,and emphatically study the locally linear embedding algorithm for dimension reduction of frequency domain signals.For the defect of selecting neighborhood in this algorithm,we put forward an adaptive selection method based on K-Means and mean in the step of neighborhood selection.By experimental comparison and analysis,it shows that the improved locally linear embedding algorithm is better and easy to popularize.3.The design and implementation of data preprocessing in two health datasets.Combined with their own characteristics,we apply proper preprocessing methods and make some improvements into "population death" dataset and "epileptic eeg" dataset for the reasonable and effective datasets in the next step of data mining.Approved by the experiment results,it shows that we have a better mining quality and efficiency after data preprocessing.

Keywords/Search Tags:

data mining, medical information, preprocess technology, random forest, locally linear embedding

PDF Full Text Request

Related items

1	The Manifold Learning Theory And Its Application In Spatial Information Processing
2	The Improvement And Research Of Locally Linear Embedding Algorithm
3	The Research Of Dimension Reduction Algorithm Based On Locally Linear Embedding And Its Applicaions In Precision Agriculture
4	Improvement Of Locally Linear Embedding Algorithm And Its Application In Face Recognition
5	Research On Spectrum Sensing Based On Random Forest In Cognitive Networks
6	Ear Recognition Method Based On Locally Linear Embedding And Its Improved Algorithm
7	Research On Feature Selection And Classification Method Based On Random Forest For Medical Datasets
8	Study Of Locally Linear Embedding To Outlier Detection In High Dimensional Space
9	Application Of Locally Linear Embedding In Text Classification
10	Image Hashing Algorithms Based On Locally Linear Embedding And Locality Preserving Projection