Font Size: a A A

Research On Sequence Pattern Mining And Rule Matching Prediction Based On Improved Apriori Algorithm

Posted on:2020-07-20Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q ShenFull Text:PDF
GTID:2428330575456391Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Association rule mining is a very important research content in the field of data mining.Based on the existing cellular network traffic data,this paper predicts the flow state of the base stations by miming the frequent patterns from the time sequence.Based on the classical Apriori algorithm in the field of frequent pattern mining and the basic idea of constructing strong association rules,in this paper,the two parts are analyzed and demonstrated in detail,and a frequent pattern mining optimization algorithm is proposed to improve the support threshold selection and matching pattern selection strategy,which is used for frequent sequence mining and prediction of base station traffic state.At the same time,on this basis,through the clustering of the base station and the periodic analysis of the flow data of the cellular network,the frequent pattern of the data matrix mining cycle after clustering is further improved,and the mining efficiency and prediction accuracy are improved.The main contents and innovation points of the paper are as follows:Mining frequent patterns of traffic state and generating association rules are prerequisites for accurately predicting cellular network traffic in base stations.Based on the cellular network traffic data of the base station,the paper preprocessed it into discrete state,adopts the sliding window model to get all the original patterns of different scale by changing the window size traversal,and sets the support threshold to select the frequent pattern.Through the experiment of different time granularity training sets,we can get the relationship between the support degree setting and the scale of the training set.This part of the experiment provides theoretical support for the rational setting of support threshold screening frequent patterns.And then the improved FP-tree structure is used to store frequent sequences.By storing bidirectional pointers pointing to two adjacent layers in a node,it provides path support for the next phase of matching patterns to trace the parent node and obtain the predictive state.Experiments show that the model performance of the base station sequence mining support threshold is optimal when the screening ratio is between 20%and 25%.Getting all the matching sequences and selecting the most reliable patterns from them for estimating the state is the key to prediction.When the pattern match is completed,the current moment state is matched to the penultimate layer of different FP-tree on the basis of a variety of frequent sequence trees of different sizes built in the previous phase,and the time state is extended forward with a moment,accordingly,the pattern tree goes back one layer up.That is,a collection of all matching paths is obtained.In this paper,a reliability calculation method is proposed to assign different weights to confidence,prediction rate and pattern size,and to select matching patterns based on maximum weighting and value for prediction.Experiments show that the weighting parameters of the model with similar granularity training set are similar when the model obtains the optimal prediction performance.The model performance is best when the rule weight ratio of the model evaluation is 0.3:0.4:0.4.And a series of experiments with different forward path lengths show that the accuracy of the algorithm tends to be stable after backtracking to a certain length.According to this,the reasonable setting of the sliding window size can effectively reduce the running cost without aff-ecting the accuracy of the algorithm.The sliding termination condition can be clarified.In this study,the sliding window can be terminated by 9 time slots.Finally,the optimal parameters obtained above are applied to the prediction model,and the improved Markov The comparison of the algorithm shows that although the performance improvement effect is different under different granularity data sets,the matching rate is at least 25%higher than the latter,the root mean square error rate is reduced by at least 38%,and the percentage error rate is reduced by at least 65%.Through the previous correlation analysis of the original base station data,it is found that the mutual correlation between the base stations is very different.Through analyzing the data characteristics and sequence characteristics of the base station,the similarity of the sequences is measured,and the cellular flow data of the base station with similar characteristics and strong correlation is divided into one class as the training set based on the condensed hierarchy clustering.The pattern that appears beyond a certain interval is defined as a non-periodic pattern,and the analysis of the flow data periodically reveals that the vast majority of frequent pattern intervals do not exceed two days,so the periodic test can be completed first to eliminate the non-periodic pattern.And in order to save the storage space of the sequence tree and simplify the pattern matching for the tree search,The structure of the mentioned sequence tree is further improved,the different levels represent the frequent pattern sets of different scales,and all the frequent patterns of the data matrix are deposited into a smaller sequence tree,and the maximum occurrence interval is considered when measuring the reliability of the periodic pattern.In order to explore the effect of mining time frequent pattern on the validity and accuracy of the algorithm based on base station clustering,this paper makes the above improvements in the mining,storage and selection mode stage of the prediction model,and then re-experiences.The prediction accuracy of the base station traffic state is at least 2.2%higher than that of the un-clustering.The root mean square error rate is 3.8%lower than the latter,and the percentage error rate is reduced by at least 11.2%.The elimination of the aperiodic mode is also the mining frequent mode.The phase saves time,and experiments have shown that at least reducing the algorithm's operation time is about 15.1%,so there is progress in reliability and effectiveness.
Keywords/Search Tags:Traffic Forecasting, Sequence Mining and Storage, Sequence-matching Strategy, Periodic Frequent Pattern
PDF Full Text Request
Related items