Font Size: a A A

The Research And Implementation Of Knowledge Discovery Based On Big Data Sets Of Power System

Posted on:2016-09-11Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y SongFull Text:PDF
GTID:2308330464469121Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The data mining and knowledge discovery in process industry is a very complicated but of great research field. All links in process industry are generated and stored a significant volume of operating data every day. A vast amount of knowledge which hides behind the large data waits for people to discover. The sequence consists of all links of the technological process can be seen as a time series. All links run in sequence. But it is a novel and difficult problem to obtain technical sequence of the internal link according to the process object data. Extracting association rules from data process industry is not a new topic. Yet, the association rules extracted by previous research methods are boolean rules. However, traditional methods can’t extract rules like chains. At the same time, the problem that the states of the links how to interact with each other is also an important problem. Study of these problems will bring to process industry of larger significance.Against this background, I studied the process object and proposed a model of knowledge discovery: Time series-Clustering-Association-Chain/Tree Flow( T-C-A-C/T Flow for short). It is an algorithm stream that aims to dig the hidden close relationships of different links in process object. The model will eventually obtain state association chain. The first step is data preprocessing. It is very complicated with T and C steps. In order to reduce the computing time, sampling the data should be the must step. We adopt a method based on difference to choose a representative sample data as the research object. In order to find out the process object temporal characteristics, this paper presents a temporal discovery algorithm based on extremum. Through the calculation of the extreme points, the time distance from other links to basic link can be obtained. The technological process of the process object can be determined according to time distance. Adjust the original data and the differential data according to technological process. The results will satisfy the order. In order to make the different state data to distinguish and aggregation, we adopt k-means algorithm to clustering the data adjusted. This step also reduces the computational complexity. To determine the best k value, silhouette coefficient based on cohesion degree and separation degree is adopted to evaluate and analysis the results. Extract the association rules of clusters from clustering set with the interdimension association rule mining algorithm based on Apriori. In order to determine the relationship between links, we put forward a method of calculating correlation which combines interest and support of association rules of clusters. Based on the correlation, the association chains(the strongest association chains and the association trees) can be generated according to the association rules of links. Those chains present the relationships of different links. For each association chain, the state association chains are the result of statistical analysis on difference data. The state association chains show the relationships of different link states. Lastly, this model tested with the data of one power plant. The experimental results show that this model can mine the strong relationships between the internal links and can express the influence relationships hidden in the process object efficiently. Make full use of the state information, we can give some relevant auxiliary instruction to process industry which have great significance.
Keywords/Search Tags:Time Series, K-means Algorithm, Interdimension Association Rule Mining Algorithm Based on Apriori, Association Chain, State Association Chain
PDF Full Text Request
Related items