Font Size: a A A

Research On Parallel Knowledge Discovery Of Real-time Process Objects

Posted on:2019-09-28Degree:MasterType:Thesis
Country:ChinaCandidate:Z HuaFull Text:PDF
GTID:2428330545969224Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer software and hardware technology,network,and information technology,data stream is continuously generated in the fields,such as financial applications,stock analysis,weather monitoring,and network security.In order to process the data stream in time and obtain the status information of the production equipment,the research group takes the complete industrial production process as the research object,and proposes a knowledge discovery model for big data ——T-C-A-C/T Flow.The model can be used to discovery the hidden relationships between the links,get the production status and abnormal situation,which has great practical significance.Based on the original knowledge discovery algorithm stream of the research group,designs and improves the based on data stream clustering analysis algorithm and association rule algorithm,which considering the characteristics of infinite increase,real-time order,and fast update frequency for the data stream.Mining the influence relationship between the links has important theoretical and practical significance.Data stream-oriented clustering analysis and association rules analysis have important applications in industries.To solve the problem of low clustering purity and difficult to reflect the data stream evolution,the DEGDS algorithm(A Data Stream Clustering Algorithm Based on Density and Extended Grid)and the CW-Stream algorithm(A Stream Data Clustering Algorithm Based on Center Weighted)are proposed.Furthermore,the division of sampling data is more reasonable by clustering processing of raw data,which provides theoretical support for the subsequent analysis of association rules.On this foundation,aiming at the low efficiency of traditional association analysis,the PIMH algorithm(Parallel Incremental Association Rule Mining Based on Hierarchical)is proposed.The research content and innovation are mainly include following three aspects.(1)A Data Stream Clustering Algorithm Based on Density and Extended GridAiming at the quality problem of traditional clustering algorithm,the DEGDS algorithm was designed and proposed.The algorithm uses the Spark Streaming parallel computing platform,it can ideally solve the influence of the initial value on the clustering result by automatically determining the micro-cluster center.In order to prevent the boundary lack caused by improper grid scale,the edges are expanded by extending the grid.In addition,grids merge is realized by the combination of adjacent density estimation and grids boundary.Finally,it is verified that the DEGDS have the ideal clustering quality,and it has good Speed-up and scalability in efficiency.(2)A Stream Data Clustering Algorithm Based on Center WeightedThen,aiming at the problem that traditional stream clustering does not emphasize the importance of current data,the clustering algorithm CW-Stream was proposed.Firstly,the iterative learning process of the center weight value can realize the accurate description of the data stream,which preserves the characteristics of the historical data and the evolution process of the data stream.Then,in order to further obtain the data object complete state,the summary information of the data object is saved in the form of a fuzzy membership degree matrix.The experimental results show that the algorithm has different degrees improvement in accuracy and efficiency,and can effectively deal with data stream.(3)Parallel Incremental Association Rule Mining Based on HierarchicalFinally,for the bottleneck of most traditional data stream association rule algorithms such as high resource consumption,limited standalone computing power,and too large candidate set,the algorithm PIMH is proposed.Based on the clustering results of the first two chapters,the algorithm uses the Spark parallel platform to process the partition data in parallel,so that the mining results can be acquired by scanning the original data set only once.In addition,using the local pruning to compress the candidate sets in parallel to solve the problem that the candidate set is too large and the memory cannot be accommodated.The experiment proves that the PIMH algorithm reduces the data mining overhead and has high time efficiency.In summary,the proposed algorithm is a supplement and improvement to existing data stream clustering algorithms and association rule algorithms.Compared with the existing algorithms,both the theoretical analysis and the experimental result shows that these algorithm takes into account the results accuracy and the time of the mining process,has better accuracy and adaptability,and can be effectively solved the corresponding mining problem.
Keywords/Search Tags:Data stream, Parallel algorithm, Clustering, Association rules
PDF Full Text Request
Related items