| Association rule mining is a popular data mining technology.According to the constraints caused by the given evaluation values(support,confidence,etc.),it shows the relationship between multiple variables of the data in the form of a rule like “variable 1,variable 2,...,=> variable n”.It is worth noting that association rule mining is suitable for discrete data,and the operating data of industry is continuous,so it is necessary to discretize the industry data.Experimental result shows that continuous data has sparsity after discretized.In the face of sparse data,the traditional association rule mining algorithms are limited by their own bottleneck,and their performance are dropped.In addition,with the improvement of industrial storage technology,the amount of data is increasing day by day.The traditional serial algorithms are limited by the resources of a single machine,and they are difficult to meet the needs of quickly acquiring knowledge from largecapacity data.In order to solve the above problems,this paper studies the association rule mining algorithm for sparse data,and constructs a parallel algorithm on the Apache Spark platform.Meanwhile,this paper applies the parallel algorithm to optimize the operational performance of a coal-fired steam unit.The specific research contents are shown as follows:1)A frequent item projection bit matrix structure is proposed,which stores the dataset in the form of bits(“0” and “1”).Based on this structure,two frequent itemset mining algorithms are designed for sparse data.They are HBM-Growth(Hyperlink Bit Matrix-Growth)based on the search technology by pointer and BM-Growth(Bit Matrix-Growth)based on the search technology by index.The experimental results show that these two algorithms can mine sparse data well.2)A parallel frequent itemset mining algorithm based on Spark is proposed.It is called PHBMGrowth(Parallel Hyperlink Bit Matrix-Growth).It is the parallelization of the HBM-Growth.And it adopts a data grouping strategy based on calculation amount estimation to achieve load-balancing.The experimental results show that the PHBM-Growth mines big data well,and the data grouping strategy considering load-balancing can effectively improve the performance of the algorithm.3)The PHBM-Growth is applied to optimize the operation of a coal-fired steam turbine unit.Taking a steam turbine unit of a thermal power plant as the optimization object,the heat consumption rate is chosen as the optimization performance index at first.Then,the PHBM-Growth is used to mine the association rules in operation data of the unit after data processing,so as to determine the multi-condition optimization target values of the heat consumption rate.The results show that,under each working condition,the optimization target value of the heat consumption rate is lower than its actual running average value. |