Font Size: a A A

Research And Application Of Visualization Techniques In Data Mining

Posted on:2010-11-29Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2178360272996230Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The amount of data stored in computer files and databases is growing withincredible speed, people expect to obtain potential and useful information fromthese data. However, the amount is so large that people can't analysis it withtraditional tools and techniques which led to the emergence of data miningtechnology. It is a process that finds the latent useful information or knowledge thatpeople do not know in advance from massive, incomplete, noise, fuzzy random data.In order to make the process and the results of the knowledge discovery easy tounderstand and realize human-computer interaction in the process,in order to knowthe relationship and development trends between the data, people comes to thevisualization technology. In fact, it has become an integral part of data mining, andwas vivid named as Visual Data Mining. The combination of the two technologycan be use to express multi-dimensional and data mining results in a efficient way,demonstrate the mining process in a dynamic way, and guide the entire data miningprocess through feedback interaction, realize the process of computing andprogramming successfully, accelerate the speed of processing data greatly, andprovide a powerful tool for the discovery and understanding of the laws of science.Adding visualization tools to the process of data mining is the developmenttrend, as well as a basic requirement of data mining system. This article gives anoverview of the Intelligent Data Mining platform developed by the laboratory,and focus on the implementation of the visualization module.Through research of related concept and knowledge of data mining andvisualization technologies, we know the definition of data mining, main steps of theprocess and main methods include such as association analysis ,classification andprediction, clustering analysis, and learned the development process from"Visualization in Scientific Computing"to"Information Visualization"ofvisualization techniques. After 20 years development, a lot of new computergraphics and analysis technologies have been proposed, and the type of data to bevisualizd has become increasingly diversified. In accordance with involved data, itcan be divided into one-dimensional, two-dimensional, multi-dimensional datavisualization and complex type of data visualization , including the time-series data,text data and multimedia data. In accordance with involved visualization technology, it can be divided into geometric projection-based technology, pixel-basedtechnology, icon -based technology, stacked -based technology. In accordance withinvolved analysis technology, it can be divided into deformation technology, virtualreality and interactive technology. Visualization technology combinates with datamining in three aspects, data visualization, data mining process visualization andmining results visualization, which intersect through the entire process of datamining, provide us an important mean to gather data in data mining and showcomplex results of data mining.DBIN Miner developed in our laboratory as the underlying framework isclosely related to the achievement of the content, so the implementation of theplatform, the core technologies and features will also be referred to. In order tosolve the problem that data mining algorithms are not fixed, cannot be added ordeleted, and graphs are lacking , the system intends to adopt Plug-in idea to organizedata mining algorithms and visualization modules, allowing users to convenientlyadd new algorithms and visualization program to the platform. At the same time, inorder to exchange and share data mining model with other data mining tools andstrengthen the system of openness and scalability , the system adopt PMML modelwhich has been accepted byW3C and widely used by people as its storage structure.The implementation of visualization module is focused on, the workflow ofthe module including the type of data source selection, data conversion,visualization and graphics type selection, and interaction selection. In order torealize the diversity of visual graphics as well as the independence of data sets, thispaper presents the concept of data object, set up data object for visual graphics andtarget data set separately. For different graph, they hava different definite methodand different data structures, according to it generate necessary.According to the mentioned respects visualization techniques and data miningcombined with, the implementation of data visualization and data mining resultsvisualization was described in detail in this paper. To the data visualizationsub-module , there are implementations of the line chart, muti-pie chart, scatter chart,box chart, parallel coordinates and circle segments, and the focus are theimplementation of parallel coordinates and Circle Segments.Parallel coordinates is a visualization way that can mapping multi-dimensionaldata points to 2-dimensional space, the graph contains many interactive operationssuch as brush, dimension control, dimensional scaling, axes exchange, correlationanalysis, which will enable users a better understanding of the detail of the graph.Circle segments divid a circle into a number of sectors, each sector representone-dimensional data .In the sector, attribute value is perform by single-color pixel,when click on the sector, there will be a line chart given below and when there is aselection that ordering one-dimensional, the data set will reorder in accordance with this dimension and the circle will make corresponding change, users would then beable to compare the characteristics of the data further.To the data mining results visualization sub-module, graphs in accordance withthe implementation of data mining methods are divided into three categories:association analysis, classification and prediction, clustering analysis. Associationanalysis implement one-dimensional association rules visualization andmulti-dimensional association rules visualization, the interaction of them are subsetand detail information display. Classification analysis implement decision tree and3D scatter graph for classification, 3D scatter graph for classification express the dataset in 3D scatter graph after classification. Decision tree implement theclassification tree, definite interactive operation such as sub-tree show and leaf nodepath information display. Clustering analysis implement the dendrogram.Dendrogram is a non-traditional expression, the graph put the result of each step ofhierarchical clustering to a tree branch's two nodes.Visualization of the module change the situation that the original data can onlybe stored with table and the mining result can only stored in xml file, making peopleanalysis the original data in visualization forms and assessing the model in aneffective way.This article concludes with a job outlook for the next step, hope to achieve datamining process visualization module further. Because of capacity constraints, thisarticle still have some inadequacies, so I hope I will be able to strengthen my basiclearning skills, define and achieve better or more complete graphical utility.
Keywords/Search Tags:Data mining, Information Visualization, Interactions, Parallel coordinates
PDF Full Text Request
Related items