| With the rapid development of science and technology especially information industry, our society has entered a new era of information. Not only the data collection capacity and means become more and more diversified but also the storage device technology is getting better and better. The continuous development of data acquisition and storage devices has brought the era of big data. Facing a great deal of data and complex information, how to extract valuable information and making it easily understood for users is the most urgent and important issue. Using the data mining is likely to cause the questions that the information is not easy to be understood or not right. So to solve the above problem just using data mining is not enough. Visual data mining is proposed in this thesis.We study visual data mining technology that combines data mining and data visualization technology together.The combination of visualization technology with data mining algorithm is relatively loose. Aiming at this phenomenon, main research content of this thesis is how to integrate data mining algorithm and the visualization technology better and more efficiently.Clustering analysis algorithm is chose as the breakthrough point of the research of the data visualization, the visualization of process and the visualization of result. Applications of social network and scientific research are provided. The main research contents are as follows:(1) A hierarchical clustering method based on MASI distance which integrates random sampling method is proposed in this thesis.The hierarchical clustering algorithm was improved. The algorithm is applied to the professional network data set and the results are visualized.After adopted random sampling in the hierarchical clustering algorithm, the time complexity of the algorithm effectively reduces. The clustering results are visualized in different tree diagram, be clear at a glance.(2) The visual model based on SOM clustering is put forward. The model is applied to the atmospheric temperature data set to realize the clustering process visualization.The interest measurement based on clustering and interest based on neighbor metrics are proposed to rank attributes, optimizing the visualization of the data mining results. The interactive visual design of this application is worth to be mentioned. The design uses the technologies such as color mapping and scaling to allow users more easily to observe and analyze the data.(3) The parallel coordinate visualization technology is combined with K-Means algorithm. Efficiency of the algorithm is improved by visualization technology. In the experiment both data and results is visualized. The improved K-Means algorithm is tested by Iris data set. The experiment proves that compared with the traditional K-Means algorithm, the efficiency and accuracy of the improved K-Means are better. |