Font Size: a A A

Research On Methord And Application Of Frequent Pattern Mining With Compressed Position Value For Storage

Posted on:2017-01-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:1108330503982467Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Computer technology has been maturely applied in many fields of the real life. It is able for data collection, storage and simple statistical analysis treatment. Data mining technology can further discovered association rules hidden in the data, and frequent pattern mining is an important process of association rules mining. Frequent pattern mining has a broad application area, and there are different classifications according to different mining objects. In this thesis, the existing frequent pattern mining algorithms are summarized in details. According to the occurrence positon of each item in each transcation, position value or bit is used for dataset compression. Frequent itemsets mining algorithms and frequent sequence mining algorithms which are contained in frequent pattern mining algorithms are studied based on position compression. Efficient algorithms under different mining requirements and applied algorithms for biological sequence mining and customer purchasing behavior mining are designed. Research contents and innovations are as follows:First of all, related definition and classification of frequent pattern mining are introduced, and the typical algorithms under different classification are given. By analyzing the research status, the existing algorithms for mining frequent patterns are summarized and compared. The merits and demerits of the algorithms are further learned for finding out the problems and new challenges. Based on the full understanding of the development process of frequent pattern mining algorithm, the typical applications for frequent patterns mining are enumerated. The development trend of frequent pattern mining is forecasted and analyzed according to the theoretical significance and application value.Secondly, two frequent itemsets mining algorithms under different requirements are proposed. Max Pat_HB algorithm can effectively reduce the size of the frequent itemsets by mining maximum frequent itemsets. It avoids too many candidate generation by generating candidates while testing the frequency. At the same time, the algorithm uses bit vectors and the stack principle together which doing the push and pop operations of a stack by just changing the bit values to achieve high efficiency. FP_Top K algorithm is for mining the top rank k frequent itemsets. It is applicable to expert system or decision support system which requires less frequent itemsets. Node set is drawn from a tree structure and each node contains the preorder tranversal and postorder tranversal orders of the tress, and candidate generation and test can be done by according to the node information. The high quality of the mining results is obtained on the premise of frequency guarantee.Thirdly, three frequent sequential pattern mining algorithms under different mining requirements are designed. CB- PMFS algorithm is a conventional mining algorithm, the location information is used and candidates can be generated bidirectionally by one comparition. It solves the bottleneck problems that the candidate generation always costs too much time during the running process of an algorithm. TDD_MFS algorithm is for mining maximal frequent sequential patterns, it can also effectively reduce the frequent sequential patterns result sets, the idea of delay decomposition is adopted, with a top-down strategy, it always decomposits the longest sequence to avoid the repeated mining of frequent subset. FIIP-BM algorithm further divides the frequent sequential pattern into intra and inter problems. The former refers to items occur in the same transaction, and the latter means items occurs in different transactions but within a fixed interval. When the interval is set to zero, the algorithm is applicable to intra issues, namely the conventional mining tasks; When the interval is not equal to zero, the algorithm is applicable to the intra and inter problems, and frequent intra and inter sequences are mined.Finally, combining with the characteristic of biological sequence and the analysis requirement of customer purchase behavior, two application algorithms are designed. Considering the continuous occurrence features of biological sequences, bological sequence mining algorithm FBSB introduces the element position information to establish quick sort list. By the requirement of the location information value must be adjacent, all candidate sets are ensured to be real ones, and all frequent sequences can be mined which satisfies the accuracy requirement of biological sequence mining. Customer purchasing behavior mining algorithm FP- ICA divides the customer purchasing behavior analysis for product oriented and customer oriented. The result for product oriented can be used for retailers manage their shelves and lead the customers buy more products in one transaction; Customer oriented mining results can be used to recommend the customers more products which may be needed and prompt the customers to buy more products which may not be considered in their later transaction.Experiments are conducted on real datasets and synthetic datasets, the efficiency, scalability and memory usage are analyzed. The efficiency and good scalability are proved on the premise of the high quality guarantee of mining results.
Keywords/Search Tags:Data mining, frequent pattern, position value compression, biological sequence, purchase behavior
PDF Full Text Request
Related items