Font Size: a A A

The Study On Incorporating Domain Knowledge Into Data Preprocessing

Posted on:2009-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:W L ZhangFull Text:PDF
GTID:2178360242989065Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As we all know, usually there are redundant data, missing data, uncertain data and inconsistent data in the databases and they become a great barrier to KDD. So in the process of data mining, data preprocessing is one key point. Using domain knowledge in data preprocessing, can effectively improve the quality of data sets, reduce the number of samples of data sets, so as to enhace the speed and quality of data mining.This paper focuses on the incorporating domain knowledge into data preprocessing. Some improved data preprocessing algorithm is given, and an incorporating domain knowledge into data preprocessing system is designed and implemented. Main contents are as follows.1. The concept and significance of data preprocessing is described generally and particularly. The main data preprocessing techniques, which involved data cleaning, data integration, data transformation and data reduction, and defects, are introduced.2. This paper introduces the concept and the research status of domain knowledge, significance of applying domain knowledge in data mining every process, and main representation of domain knowledge.3. This paper lays a strong emphasis on studying classification and representation of domain knowledge for data preprocesses, such as Rang Knowledge, Hiberarchy Knowledge, Rule Knowledge, Statistic Knowledge, designs store structure and algorithm having two layes, base on data dictionary and XML files and preprocesses algorithm using domain knowledge.4. Two data preprocessing algorithm including missing data cleaning and data discretization are focused on. A method of applying clustering algorithm for missing data cleaning, an improved ROUSTIDA algorithm base on valued similarity relation and a attribute-class difference discretiztion with three impoved aspects, including the initial cut-point, the greatest tolerance interval and the cut-point of every attribte, are reseatched.5. The framework, designing method and working process of the applying domain knowledge data preprogrecessing system are given and implement the system.
Keywords/Search Tags:Data Preprocessing, Domain knowledge, Data Mining
PDF Full Text Request
Related items