Font Size: a A A

Some Key Problems In The KDD

Posted on:2004-10-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:L ChenFull Text:PDF
GTID:1118360122480033Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
III Abstract Knowledge discovery in database is a rapidly growing field, whose developmentis driven by strong research interests as well as urgent practical, social, andeconomical needs. KDD is a suit of scientific method, algorithm, software tool andenvironment and can be used to develop information resources. KDD is amulti-disciplinary cross research field, including statistics, artificial intelligence,pattern recognition, parallel computing, machine learning, database technology and soon. KDD is the nontrivial process of identifying valid, novel, potentially useful, andultimately understandable patterns in data. We also adopt the commonly useddefinition of data mining as the extraction of patterns or models from observed data.Generally, KDD consists of interactive steps: data cleaning, data integration, dataselection, data transformation, data mining, pattern evaluation, and knowledgepresentation. KDD have three important parts: data preprocessing, data mining, andvisualization. Based on international researches on data mining with its state of art and advance,by analyzing the concepts and theories of the evolution in life sciences, computationalintelligence, relational algebra, and Petri-nets, a series of methods are proposed. Suchas, data preprocessing algorithms based on self-adapted clustering, hybrid algorithmsof self-adapted and relational algebra theory to generate association rule andgeneralized association rules, document classification and reduction dimensionstechnology, an overview visual theory and technology of the results for data mining,to give a kind of petri-nets visualization frame of robust and uniform, to implement anexample of data mining. KDD involves an integration of techniques from multiple disciplines. Researchersof the different fields have been done a great lot of investigation with data mining.Now research of the KDD and data mining are at a stage where research is translatedalgorithm and modifying into application software. However, relatively little has beenpublished about theoretical mechanism and frame. A possible theory system andframe about KDD and data mining is discussed. (1)Preprocess data is one of the key factors of success in data mining. KDD modelis put forwarded, which is combined KDD with database or data warehouse. Byself-adapted parallel optimal algorithm with database theory, a preprocess dataalgorithm is performed. Simulation shows that this algorithm is validity. This Key Lab for Radar Signal ProcessingIV Some Key Problems in the KDDalgorithm can be used to data preprocessing of classification, cluster, association rulemining and can be generalized different data type. (2)On the basis of preprocess data, the algorithms, which are cooperated withrelational theory, database theory and KDD theory, used to mine association rule andgeneralization association rule are proposed. Both theoretical and simulation showthis novel algorithm is effective and feasible. Compare this algorithm with a-priori,this novel algorithm needs once to scan database and possess favorable parallel andscalable, and also suitable extend to mine fuzzy association rule. (3)Classification is one of the primary applications in KDD. The formal definitionand basic mechanism of classification are given. An essential frame of Web documentclassification is proposed. Then, reductive dimension methods is analyzed and appliedto documents classification. Finally, the ideas of principal component analysis andsupport vector machine can also be used to document classification, the results ofsimulation is discussed. (4)Visualization presentation, which is used to data mining result and data miningprocedure, is also an important component. By analysis present visualization theoryand presentation methods, a uniform and robust Petri-nets frame is presented.Example shows that Petri-nets can be used active rule...
Keywords/Search Tags:KDD, data mining self-adapted cluster immune genetic algorithm, preprocess data, association rule, generalized association rule Document classification Petri-nets principal component analysis visualization, virtual database
PDF Full Text Request
Related items