Data Clustering And Visualization Technology

Posted on:2009-04-16

Degree:Master

Type:Thesis

Country:China

Candidate:J X Liu

Full Text:PDF

GTID:2208360245461585

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the fast development of the computer hardware and software, especially the great advance in Internet techniques, the volume of the data which people have accumulated is now increasing very fast. The whole volume is too large to find out the knowledge hidden in such a large data set. This problem has been studied by researchers for a long time. Data mining is one of the solutions to solve this problem. As an important issue of data mining, clustering analysis has attracted more and more attentions of the researchers. Most of the clustering algorithms cope with a small dataset wonderfully well, but when coping with a large-scale database, they may lead to a declinational result and cost much of the computer memory. Therefore, a scalable clustering framework or a scalable clustering algorithm is required to solve this problem. In order to observe the data mining results intuitively, visualization is applied and has played an important role in data mining. Data mining visualization combines human being's visual advantages and subjective acknowledge, makes the data mining process intuitive and interactive, and thus gains more valuable and understandable information.Based on the MinerOnWeb data mining system, this paper focuses on a visualization technology which has good human-computer interaction and a clustering method in large data set. MinerOnWeb is a data mining system designed to provide data mining service online. Based on this system, this paper has implemented parallel coordinate technology and the clustering method in large dataset, and:(1) Parallel coordinate technology: different from traditional data visualization methods, this method transfers all the data dimensions in the same plane, so users do not have to rotate axis and can see all the data attributes and relationships among these data attributes in the same map; However, a traditional data visualization method can only see two or three data attributes and their relationships. And users can not see more than three attributes and their relationships. This method used in MinerOnWeb system could display the clustering results, and different clusters will be marked with different colors. Thus, users can see the relationships among the various categories clearly, as well as the relationships among the data. In addition, to make the user understand the clustering results more easily, some action listeners are added in the axes of parallel coordinate.(2) Clustering method in large datasets: most of the traditional clustering methods will input the whole dataset into computer memory to analyze. But to a large dataset, it is very hard to input the whole dataset into memory, and it will ask good computer equipment. Clustering algorithms which based on iterative calculation need user to adjust the algorithms'parameter, until the user get a better result. Even a clustering algorithm without the iterative calculation also requires users to repeatedly adjust the relevant parameters to obtain optimum results. However to cluster a large dataset, these methods will be a hard procession, and require a lot of computing resources and computing time. In this paper, the introduction of a sample method will solve this problem of large data sets clustered. Firstly samples are chosen randomly from a large data set, these samples are clustered to get a cluster model, and labeled the other data in the data sets based on the model which is established by the samples, thus achieving a efficient clustering results in large data sets.

Keywords/Search Tags:

Parallel coordinate technology, Clustering method in large datasets, MinerOnWeb, Brush, Dimension Constraints

PDF Full Text Request

Related items

1	Application And Research On Clustering Algorithm In Large Scale Datasets
2	Research And Implementation Of Parallel Clustering Algorithm Based On Approximate Spectrum Hadoop MapReduce
3	Efficient visualization of large datasets using parallel processing and visibility computation
4	Large-scale Data Clustering Technology Research And To Achieve
5	Geometric methods for mining large and possibly private datasets
6	Research Of Parallel Sparse Subspace Clustering Methods Based On Coordinate Descent
7	Research And Application Of Multi-dimension Data Visualization
8	Research On Isosurface Extraction Of Large Scale Datasets With Out-of-Core Method
9	Research And Application Of Clustering Algorithm On The High Dimensional Datasets
10	Research On Spectral Clustering Methods For Large Scale Datasets