Font Size: a A A

Research On Visual Analysis Method Of Multi-dimensional Stream Data

Posted on:2018-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:C LuFull Text:PDF
GTID:2428330596454797Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet of Things technology,the acquisition of multi-dimensional stream data is becoming more and more popular,and it is necessary to effectively excavate and visualize it.Compared with the traditional static data,multidimensional stream data has the characteristics of infinite number,fast arrival,arrival order without constraints.This data is more complex,the structure is more difficult to find,in the face of large-scale multi-dimensional stream data,how to effectively analysis and mine data and discover information and knowledge hidden behind it has been a difficult problem for the academic circle.The main way to obtain information is from the visual system,using of visual way can make users more intuitive,more efficient insight into the data behind the inherent model,to understand things deeper rules.Due to the "disaster prevention" problem,direct visualization of multidimensional stream data will cause a lot of visual confusion,which is not conducive to the discovery of the internal structure in the multidimensional stream data.Therefore,before the visualization,it is necessary to use the data mining technology to preprocess the convective data.1)In this thesis,from the point of view of data analysis,for the purpose of visual analysis.Based on the multi-dimension and time-varying characteristics of stream data,this thesis studies the stream data clustering method for visualization and the dimension reduction method of the stream data for visual analysis respectively.Finally,we apply the research results and design the multi-dimensional stream data visual analysis prototype system.The main work and results are summarized as follows:2)In order to analyze the inner structure of multi-dimensional stream data,a stream data clustering method for visualization is proposed.This thesis analyzes the shortcomings of the CluStream algorithm and introduces the effect of the decay time factor to reduce the historical data on the clustering results.Aiming at the problem that the stream data dimension is high and the amount of data increases rapidly,the efficiency of the algorithm is greatly reduced,and an improved algorithm DPCluStream is proposed,and the parallel design is carried out on the Spark Streaming.Finally,the data of the KDD-CUP99 data set and USDA food nutrient data set are used to simulate the stream data,which verifies the clustering quality and timeliness of the algorithm.3)In order to make the results of data clustering better display on parallel coordinates,reduce the visual confusion caused by the sharing point and the folding of the line caused by visual confusion.In this thesis,we study the multi-dimensional stream data curve aggregation method based on parallel coordinates,and use the Bezier curve instead of the original polyline.The curve aggregation model is established to converge the curves at the center of the cluster,so that the users can observe the clustering results more efficiently.4)In order to reduce the visual confusion caused by visual analysis of multidimensional stream data,a parallel reduction algorithm is proposed.Firstly,a maximum likelihood eigenvector estimation method for multi-dimensional stream data is designed to dynamically guide the dimension of dimensionality reduction.Then,the advantages and disadvantages of PCA and LLE reduction algorithm are analyzed,and a two-step reduced dimensionality reduction algorithm is designed and implemented on the Spark platform to meet the real-time requirement of visual analysis of stream data.Finally,the stream data of the MNIST handwritten data set is simulated and the validity and feasibility of the combined dimensionality reduction algorithm are verified by using the visualization method of radial coordinate mapping optimization algorithm.The experimental results show that the proposed algorithm is effective and feasible.Based on the above research results,according to the characteristics of interactive visual analysis.A prototype system of multidimensional stream data visualization is designed.Firstly,the requirements analysis of the system is described.Then,the general structure and environment configuration of the prototype system are introduced,and the functional structure of the prototype system is designed.The stream data clustering analysis module,the stream data dimension reduction analysis module,the parallel coordinate visual analysis module,the radial coordinate visual analysis module and the multi-view co-analysis module are realized respectively.Finally,using the USDA food nutrition data sets to simulate the stream data,the system function test results show that the prototype system can fully integrated with the computer's computing power and user domain knowledge,and providing a large number of human-computer interaction interface to help users more quickly and intuitively uncover hidden information in the multi-dimensional stream data.
Keywords/Search Tags:data clustering, dimension reduction, parallel coordinates, visual analysis
PDF Full Text Request
Related items