Font Size: a A A

The Study Of Mining Algorithm Based On Weighted Multiple Minimum Supports

Posted on:2015-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:M X ZhanFull Text:PDF
GTID:2298330431483885Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the data stream continuous development and application, data mining has become the main way to get information in the data stream, especially the maximal frequent pattern mining has become today’s research focus, Which provides decision support and business projections, and therefore great value in practice.The minimum item support MIS(Minimum Item Support) set the key attributes of support for each data item in data stream,in order to carry out mining data items after trimming; whereas maximal frequent patterns MFPs (Maximal Frequent Patterns) is mining maximal frequent patterns on the basis of MIS. But for existed maximal frequent patterns MFPs, which ha higher compression ratio, just consider the mining conditions of support, can not distinguish the weight of each frequent pattern. So it does not reflect the actual property information. Therefore, the expansion and improvement of the MFPs algorithm will be very meaningful. Based on the analysis and summary of the advantages and disadvantages of MFPs algorithm, this paper make the deeply research work as follow:1. The existed MFPs algorithms mining frequent pattern process, willproduce a large number of middle set, spend a lot of time and space, and do not take into account the support of multiple mining conditions. In response to these problems, this paper construct a data storage structure CPLMS-tree(Compact Preorder Linked Multiple Supports tree), and propose to meet multiple minimum supports frequent pattern mining algorithm MSCP-growth(Multiple Support-Conditional Pattern growth). By the attribute iflag constructed in the data structure to indicate whether the frequent subsequences, by the attribute mps to indicate the minimum value MIS, and take the above two attribute values as a trim condition, By setting a different level of support for frequent data items stored for mining frequent patterns, which can reduce a large number of the generated frequent patterns candidate sets, and quickly obtain the valuable frequent patterns. Finally, through comparing the experimental with the traditional algorithm PLWAP-Mine, the proposed algorithm verifies that the MSCP-growth algorithm in execution time, the number of generated frequent patterns candidate sets and generated frequent patterns, the size of the space occupied superior to PLWAP-Mine algorithm.2. In the data stream environment, the existed weighted maximum frequent patterns WMFPs (Weighted Maximal Frequent Patterns) algorithm requires multiple database scans for frequent pattern mining, and do not take full advantage of combining the weighting factors with the minimum support, generate a large amount of worthless maximal frequent patterns candidate set, In response to these problems, this paper construct a new data storage structure MWS-tree (Maximal Weight Streams tree), by using the maximum weights MW(Maximal Weight) for trimming conditions, greatly reduces the search of frequent patterns, and build contains the support index information array structure WMFP-array(Weighted Maximal Frequent Patterns array), through the support index information in WMFP-array to reduce the number of database scanning, and combine single path with the database support in order to reduce traversing the tree structure.3. Based on the MWS-tree, this paper propose the maximum weighted data stream algorithm MWS(Maximal Weight Streams), which uses the data item weight information WI(Weight information) and the minimum support threshold δ mining maximal frequent patterns, and conduct the inspection operation for the subset of frequent patterns after the maximal frequent patterns have been mined. The final result is stored in the maximal frequent patterns data structure WMFP-tree (Weighted Maximal Frequent Patterns tree), and maximal reduce unnecessary excavation operations. Finally, by comparing with the traditional algorithm IWFP and its improved algorithm IWFP*, the simulation experiment verifies that MWS algorithm in execution time, the size of the space occupied the proposed algorithm superior to the IWFP and IWFP*in execution time and the size of the space.
Keywords/Search Tags:Data mining, minimum support, maximal frequent patterns, weighting factor
PDF Full Text Request
Related items