Data Quality Control: Research, Design, And Implementation In Data Preprocessing

Posted on:2005-01-04

Degree:Master

Type:Thesis

Country:China

Candidate:J Luan

Full Text:PDF

GTID:2168360152955528

Subject:Computer applications

Abstract/Summary:

PDF Full Text Request

Data mining is a process that discovers knowledge from mass of data and information. At present, most papers concentrate on data mining algorithms while neglect data preprocessing, which generates data with integrity, little redundancy and relativity between attributes for further analysis. Plenty of insignificant data may affect mining efficiency and outliers may decrease the precision of algorithms. Therefore, data preprocessing has become the crux in data mining system implementation. The contribution of this paper is twofold: Data Warehouse quality control (Extract_Transform_Load) and Web site quality control framework. Our key items are:(1) Analyzing the specialties and difficulties of ETL and giving the ETL architecture.(2) Investigating data problems appear in single data source and multi-data sources (schema level and instance level) of DBMS.(3) Implementing an enterprise data ETL by shell scripts on AIX system of RS6000. (4) Designing a scalable framework for text preprocessing: two model (VSM and LM) and three phase algorithm for quality control.(5) Analyzing the main modules in the framework, such as word split, language analysis, modeling and feature selection. In terms of the characteristics of text stream, we put forward two ad hoc strategies for the framework: (a) A High-speed Matching Strategy Based on Similarity; (b) An Incremental Support Vector Machine Training Strategy. The extensive experiment results are given to show substantial improvements over incremental SVM training method.

Keywords/Search Tags:

Data mining, Data Preprocessing, Data warehouse, ETL, Template matching, SVM

PDF Full Text Request

Related items

1	Research And Design Of Data Warehouse Model For Mine Geological Data
2	Data Mining Technology In Tax Collection And Management Decision-making
3	The Research And Application Of Data Preprocessing In XML Data Warehouse
4	Technology And Application Research Of Data Mining
5	Application Of Data Warehouse And Data Mining Technology In Tax Administration System
6	The Design And Implemetation For Customer Data Analyzing System Based On Data Warehouse And Data Mining Technology
7	Data Warehouse And Data Mining In The Securities Brokerage Business Crm Applications
8	Design And Implementation Of Data Preprocessing System Oriented To Data Mining
9	Data Warehouse And Data Mining Technology Theory And Applications
10	The Analysis In Coal Mine Historical Data Based On Data Warehouse