Font Size: a A A

Research On Data Preprocess And Interactive Visualization For Data Mining

Posted on:2008-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:H LanFull Text:PDF
GTID:2178360242469543Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data Mining is the technology of finding the rules from large amount of data. It has been considered as one of the most important research topics in computer science. Data Mining Platform is a tool which bridges data mining research with application. How to combine data mining algorithms with application fields, to build a data mining platform which is friendly to various users, is an urgent problem needed to be resolved in data mining research area.Compared with other data mining platforms,such as MS SQL2005,Oracle 10g, WEKA obviously has advantages in supporting machine learning algorithms. WEKA has provided almost all kinds of machine learning algorithms and implemented in JAVA. But WEKA has a lot of disadvantages for data minging. So it's important to change WEKA machine learning platform into a data mining research and application platform."Research and Implement Interactive & Exploratory JAVA Data Mining Platform"is a key project which is finally supported by Science and Technology Ministry of Jiang-xi province. The project is to build a data mining research and application platform passed on WEKA. This thesis work is a part of this project. It includes:1. Design and realize a preprocess system, which is called Concept Hierarchy Tree-based Visual Preprocess System(CHTBVPS). CHTBVPS referenced from WEKA preprocess platform. CHTBVPS integrates ETL(Extract-Transform-Load), data mining algorithms with visualize technology together. It's a suitable system which owns mainly preprocess functions, including data extracting, data cleaning, data reduction and file output etc. CHTBVPS enables interactive concept hierarchy tree building for user to control, and helps user to understand the data mining process. CHTBVPS has been tested against large relational database. The experimental results show that the system are efficient and effective for data mining preprocess.2. Improve and optimize the concept hierarchy tree algorithm. Concept hierarchy plays a fundamentally important role in data mining. Through automatically generating the concept hierarchies, the mining efficacy is improved, and the knowledge is discovered at different abstraction levels. Four algorithms about concept hierarchy tree are designed and implemented in this thesis. Two of them are attribute-based concept hierarchy tree algorithm(ABCHT), attribute value-based concept hierarchy tree algorithm(AVBCHT), which belong to user-interactive algorithms based on professional fields, The other two are "interval-based auto-generating algorithm "(TAGA) , "distribution-based auto-generating algorithm "(DAGA), which produce the results autonomously. Fields-based user-interactive algorithms are helpful for user's domain knowledge and experiences so that they can improve the data mining process. TAGA & DAGA have improved Equal-Frequencies of occurrence Auto-Generate Algorithm. Four algorithms have their specialities respectively. They widen the application fields of concept hierarchy tree. The results after data reduction are even more meaningful for the next steps in data mining.
Keywords/Search Tags:concept hierarchy tree, data reduction, concept hierarchy database, visualization, data mining
PDF Full Text Request
Related items