| With the rapid development of science and technology,modern industry is becoming more and more large-scale and intelligent.With the advancement of industrial digitalization and intelligence,the volume of production process parameter detection and state-aware data is also growing.Undoubtedly,these massive industrial production data provide a wealth of information for business operators and researchers,providing the possibility to analyze the state of industrial production.At the same time,the data analysis technology also suffers from the intricate parameter relationships.Without rapid and effective analysis methods,it is difficult to obtain useful information from the complicated data.Machine learning is a type of data mining and analysis technology that has been popular in recent years and the technology has been widely used in many fields with its powerful analytical computing power.It is also one of the hot areas that using machine learning and related methods to analyze complex industrial production data.In this thesis,the process data analysis methods of complex industrial systems are studied in detail,the research status of cluster analysis and causal analysis methods is systematically analyzed,and the application of machine learning and information theory in data analysis of complex industrial systems is deeply studied.Considering the related knowledge of information theory,an industrial process data analysis system method based on machine learning is proposed.The research results are as follows:(1)Aiming at the large number of parameters of complex industrial systems,the high data dimension and the existence of time delay and jitter,a fuzzy c-medoids clustering algorithm using constrained dynamic time warping is proposed.Based on fuzzy c-means clustering,this method replaces the traditional Euclidean distance with dynamic time warping to improve the shortcomings that Euclidean distance inaccurate evaluation of industrial process time series data similarity and dynamics.The warping path is constrained to reducing the time complexity of the algorithm.The simulation results show that compared with the traditional algorithm,the proposed algorithm has obvious accuracy and significantly accelerates the time complexity.Since the fuzzy c-means clustering needs to input the number of clusters in advance,but the actual industrial system always cannot get the number of clusters in advance,an improved affinity propagation clustering algorithm is proposed for this kind of problem,the method can directly divide the data without pre-data clustering the expected number of classes..The algorithm is insensitive to the time delay existing between time series when uses the dynamic time warping measure to communicate with each other and data with negative correlation can be classified into one class when using mutual information.The simulation results show that the proposed clustering algorithm can achieve more detailed classification results.(2)A comprehensive model for analyzing the time delay and causality of data associated with industrial processes is proposed.Aiming at the time delay problem between parameters,a method of calculating the sliding window time delay based on mutual information is proposed.The method can accurately calculate the correlation time delay between two parameters with nonlinear correlation.The simulation results show that the proposed method is more practical than the traditional delay time calculation method using Pearson correlation coefficient.Aiming at the trend of industrial production data,a Bayesian causal network construction method based on trend transfer entropy is proposed,which transforms the original time series into trend time series through feature point extraction and piecewise linearization.The causality network of the time series data obtained by parameter measurement is fully utilized to construct the causal relationship network.The simulation proves that the algorithm has better effect than the general transfer entropy in the classical Tennessee-Eastman industrial process. |