Research And Application On Data Preprocessing Algorithms

Posted on:2007-09-24

Degree:Master

Type:Thesis

Country:China

Candidate:X F Li

Full Text:PDF

GTID:2178360182495826

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the coming of information age, human are confronted with increasing data and information in different fields. At the same time, these data are developing in surprisingly speed. In order to improve work efficiency and life quality, people must obtain the valuable information hidden in these data. So, researches that mining knowledge from databases are started. However, as well known, there are many issues in databases, such as redundant data, missing data, uncertain data, inconsistent data, and so on, they are the barriers to knowledge discovery. Therefore, it is important to preprocess data before knowledge discovery from databases.And this paper focuses on the data preprocessing in data mining, especially on the data cleaning, and the data preprocessing functions are implemented also in Data Mining Laboratory Platform (DMLab).Firstly, the knowledge of data preprocessing is described generally and particularly, and the research background, concept and the research status of main preprocessing techniques are introduced. Then, the existing data preprocessing techniques are analyzed deeply, which involved data cleaning, data sampling, data transformation and data reduction. The paper lays a strong emphasis on the missing data imputation techniques, and many imputation algorithms are studied and discussed in detail, the imputation algorithm based on clustering technique is proposed. Finally, the data preprocessing module in Data Mining Laboratory Platform is implemented based on many techniques discussed earlier, and the module contains data cleaning, data sampling, data transformation and data reduction functions respectively.The paper introduces basic knowledge and algorithms of data preprocessing technologies, especially missing data cleaning, and discusses the merit and drawbacks of missing data cleaning techniques objectively. Many data preprocessing techniques that applied widely at present are studied, and the design and implementation of data preprocessing module functions in DMLab system were achieved based on the studies. Not only implement the basic preprocessing algorithms according to system demand, but a new methodapplying clustering algorithm for imputation is proposed, at the end the test result and conclusion are provided.The leading creative point is the imputation algorithm proposed of missing data based on the cluster technique.

Keywords/Search Tags:

data mining, data preprocessing, data cleansing, missing data, imputation

PDF Full Text Request

Related items

1	Research And System Construction Of Data Preprocessing Mechanism
2	Studies On Missing Data Imputation
3	Research On Bayesian Network Based Missing Value Imputation Model For Incomplete Credit Data
4	Research Of Data Preprocessing For Data-driven Modeling
5	Application Of Data Pre-processing Method In The Mobile Telecommunication Industry
6	Research On Passenger Transport Data Quality Detection And Missing Data Imputation
7	Research On Key Technologies Of Missing Data Imputation In Wireless Sensor Networks
8	Research On Missing Data Imputation Method Based On Generative Adversarial Network
9	Research On Data Collection And Data Imputation Based On CrowdSensing
10	Research On Data Imputation Methods Oriented Specific Domains