Font Size: a A A

Based On The Clustering Of The Data Warehouse Data Mining Tools

Posted on:2003-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:G LiFull Text:PDF
GTID:2208360062996458Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Clustering analysis is an important part of the whole Data Mining system. Clustering is the process of grouping the data into classes or clusters so that objects within the same cluster have high similarity in comparison to one another, but are very dissimilar to objects in other clusters. Dissimilarities are assessed base on the attribute values describing the objects. Clustering has its roots in many areas, including data mining, statistics, biology, and machine learning.Clustering processes are always carried out in the condition with no pre-known knowledge, so the most research task is to solve that how to get the clustering result in this premise.As the development of Data Mining, a number of clustering algorithms has been founded, In general, major clustering methods can be classified into the following categories: Partitioning methods; Hierarchical methods; Density-based methods; Grid-based methods; Model-based methods; besides these, some clustering algorithms integrate the ideas of several clustering methods. Although all these methods have got great achievement in different field, but these all meet difficulties when processing Huge quantity data base. So it is a main aim to analysis the reason to this problem, and detail resolvent has been given. The following problems will be discussed:l.The accuracy of the clustering algorithm: The accuracy of the clustering methods refers to the partitioning accuracy and destination of the original data set. It is easy for present clustering algorithm to process the data set with regular partitioning characters, but difficult to deal with the unregular data set. At the same time, it is difficult to deal with huge quantity data set for present algorithm. So it will be discussed in this thesis.2.Compare of the present clustering algorithms. It is sometimes difficult to classify a given algorithm as uniquely belonging to only one clustering method category, so detailed compare and analysis have been given in this thesis.3.The large complexity of time and space consuming. Because of the huge quantity and high complexity of the original data set, it is important for a practicalalgorithm to reduce the complexity in time consuming, this problem will be discussed in this thesis.4.Amelioration of the Partition-based Method. Partition-based method is a practical cluster way to cluster data set, but the efficiency of this method is strongly depend on the pre-known knowledge, especially it is necessary for this method to give the clusters' number in advance. A new method will be given in this thesis about how to deal with this problem.5.The Over Training Problem of Neural Network. Kohonen network is the important one of the Model-based algorithms. Self-mapping and self-organizing are it's main feature, these make it easy to find out the profile of original data set in the condition with no pre-known knowledge. On the other hand, it's disadvantage limited it application field, for it has so high a compute complexity, and large quantity of original data set always over trains the net work. In this paper we discussed how to modify it's structure and make it run faster.6.Resolvment of the Data Mining system based on Data Warehouse system. It is necessary to storage data in high regular and high consistency, Data Warehouse provide all the conditions to realize this, so in this paper a resolvent about how to build distributed data mining system on data warehouse will be discussed.
Keywords/Search Tags:Data Mining, Data Warehouse, Clustering Algorithm, Neural Network
PDF Full Text Request
Related items