Font Size: a A A

Research On Data Warehouse-based Classification Data Mining For Large Scale Dataset

Posted on:2003-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:L B KongFull Text:PDF
GTID:2168360062496543Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data Warehouse is a hot research area in 90s Its main motif is to provide the decision-maker a powerful tool: gathering the datain pure consistent, relevant pattern, and making use of the data in managing analyzing, data-mining purposec That means that the decision-maker can use the tool to understand, grasp the situation of the business from different directions and forecast the future of it When using Data Warehouse, the processing speed determines Data Warehouse's practicability and processing ability The HOC (Highway Decision Center) system realized before solves some key problems about intermediate scale data, mainly concentrating Data Warehouse performance coefficient When using HDC in large scale data, it encountered processing speed problem Then the settlement of this problem becomes a major research point So, based on the former research achievements, the present task is to construct the renowned Data Warehouse architecture and its relevant algorithms, then adapts the system to the large scale dataset with Data Mining functions c This paper is a part of the researchIn order to construct the powerful system, a key problem is to cope with the processing-speed problem and the data space problem, etc, -caused by the large scale dataset and magnificent dataset This is also the core in the present Data Mining researchThis paper's motive is to design and realize a Decision-Tree Classifier in the Data Warehouse System for large-scale dataset. In this paper, based on the comprehension of the current research situation, we mainly discussed the problem how to adapt the Decision Tree in common use to the large scale dataset.This paper first illustrated some typical algorithms for large dataset, then gave off a processing diagram in common useSecond, for the dataset with large quantity and many attributes, we renovated the calculation method of the attribute's statistic information, giving off a ameliorated algorithmThis thesis consists of five sections Chapter one depicts the background knowledge and illustrates the position of data mining among many concepts Also here is the data mining's category Chapter two describes the thought of classification data mining technique, puts forward the construction and pruning algorithms of decision tree classifier Chapter three discusses the problems of adapting data mining technique with large scale dataset, and demonstrates some feasible process stepso Also here we touches upon the combination R-DBMS data warehouse Chapter four is the design of the program and some result Chapter five gives the annotation the conclusion, and the arrangement of future research...
Keywords/Search Tags:Data Warehouse, Data mining, Classification, Decision Tree Classifier, Multi-dimension Architecture, MDX
PDF Full Text Request
Related items