Font Size: a A A

Data Clustering And Visualization Technology

Posted on:2009-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:J X LiuFull Text:PDF
GTID:2208360245461585Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the fast development of the computer hardware and software, especially the great advance in Internet techniques, the volume of the data which people have accumulated is now increasing very fast. The whole volume is too large to find out the knowledge hidden in such a large data set. This problem has been studied by researchers for a long time. Data mining is one of the solutions to solve this problem. As an important issue of data mining, clustering analysis has attracted more and more attentions of the researchers. Most of the clustering algorithms cope with a small dataset wonderfully well, but when coping with a large-scale database, they may lead to a declinational result and cost much of the computer memory. Therefore, a scalable clustering framework or a scalable clustering algorithm is required to solve this problem. In order to observe the data mining results intuitively, visualization is applied and has played an important role in data mining. Data mining visualization combines human being's visual advantages and subjective acknowledge, makes the data mining process intuitive and interactive, and thus gains more valuable and understandable information.Based on the MinerOnWeb data mining system, this paper focuses on a visualization technology which has good human-computer interaction and a clustering method in large data set. MinerOnWeb is a data mining system designed to provide data mining service online. Based on this system, this paper has implemented parallel coordinate technology and the clustering method in large dataset, and:(1) Parallel coordinate technology: different from traditional data visualization methods, this method transfers all the data dimensions in the same plane, so users do not have to rotate axis and can see all the data attributes and relationships among these data attributes in the same map; However, a traditional data visualization method can only see two or three data attributes and their relationships. And users can not see more than three attributes and their relationships. This method used in MinerOnWeb system could display the clustering results, and different clusters will be marked with different colors. Thus, users can see the relationships among the various categories clearly, as well as the relationships among the data. In addition, to make the user understand the clustering results more easily, some action listeners are added in the axes of parallel coordinate.(2) Clustering method in large datasets: most of the traditional clustering methods will input the whole dataset into computer memory to analyze. But to a large dataset, it is very hard to input the whole dataset into memory, and it will ask good computer equipment. Clustering algorithms which based on iterative calculation need user to adjust the algorithms'parameter, until the user get a better result. Even a clustering algorithm without the iterative calculation also requires users to repeatedly adjust the relevant parameters to obtain optimum results. However to cluster a large dataset, these methods will be a hard procession, and require a lot of computing resources and computing time. In this paper, the introduction of a sample method will solve this problem of large data sets clustered. Firstly samples are chosen randomly from a large data set, these samples are clustered to get a cluster model, and labeled the other data in the data sets based on the model which is established by the samples, thus achieving a efficient clustering results in large data sets.
Keywords/Search Tags:Parallel coordinate technology, Clustering method in large datasets, MinerOnWeb, Brush, Dimension Constraints
PDF Full Text Request
Related items