Font Size: a A A

Mining Approximate Association Rules From Multiple Time-series Base On Minhash Technique

Posted on:2018-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:X D ZhangFull Text:PDF
GTID:2428330566998547Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the increasing scale of data generated from all walks of life,the traditional association rules mining algorithms can't directly applied to large-scale data.Because traditional association rules mining algorithms cannot discover knowledge effectively and timely.Time series data,as a typical stream data,has the characteristics of massive and rapid generation rate,the data and knowledge will change with time.The efficiency becomes a major problem,when mining from time series data by using association rules.Some researchers have proposed approximate association rules mining methods to solve this problem.Although the approximation association rules have some errors compared with the original rules,it can save the execution time of association rules mining algorithm,and is more practical than the exact mining in many practical scenarios.Comparing with the original association rules,the approximate association rules have some errors.But they can greatly improve the mining efficiency.Therefore the approximate methods are more useful than classical methods in many scenarios.At present,there are some researches on mining approximate association rule mining.Most of the existing methods adopt the traditional association rules algorithm and data sampling method.Although sampling is a simple and effective method to obtain approximate rules,but it deeply rely on data distribution.There are many differences between the different types of data sets,between the sample set and the original data set,between the sample set and the sample set.Therefore,the approximate association rule algorithm using sampling methods lacks a unified system model.In recent years,some approximate estimation methods have been used to assist association rules mining.Theses algorithms calculate the time-consuming steps in association rule mining by using the estimation method.Based on the Min Hash technique and the traditional association rule algorithm Eclat,this paper proposes a new approximate association rule minin g algorithm called Hash Eclat.In this algorithm,the Min Hash technique is used to estimate the intersection size between sets,which overcomes the problem that Eclat algorithm is inefficient in computing frequent itemsets.The quality of an approximate alg orithm is mainly determined by how much speed is raised and whether the error is controllable or not.As theoretical analysis and experimental verification shows that the approximation error of Hash Eclat algorithm can be controlled in a reasonable range.Compared with other approximate association rule mining algorithm s,the Hash Eclat algorithm has significantly advantages in improving the speed of association rule mining.In order to further verify the effectiveness of the approximate association rules,this paper compares the accuracy of the original association rules and the approximate association rules by using the time series data.Experiments show that the approximate association rules have little difference from the original association rules in the prediction.When the association rules are used to predict the time series directly,some time series features will be loss.Because the association rules mining needs to obtain the transaction set from the original data.These features,such as sequence,sequence and shape,have an important impact on the final prediction results.To solve this problem,this paper proposes an Sim TSConf which based on dynamic time warping.This algorithm combines the similarity between the time series and the confidence value of the association rules,and makes more reasonable evaluation in association rules mining.In order to validity of the Sim TSConf algorithm,we designed an experiment using actual thermal power plant dataset in this paper.
Keywords/Search Tags:association rules, time-series predict, approximate algorithm, minhash
PDF Full Text Request
Related items