Font Size: a A A

Research On Mining Algorithm Of Web Log Frequent Sequential Patterns

Posted on:2017-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:M Y ZhengFull Text:PDF
GTID:2348330509454000Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Web log frequent sequence patterns mining is an important field of Web log mining and of discovering interactive frequent sequence patterns between users and websites. It is easy to analysis users' access sequence patterns by utilizing these sequence patterns and build models to adjust websites and it is meaningful to build an intelligent website by mining Web log frequent sequential patterns to improve users' experience and increase users' quality. And it's very meaningful to create intelligent websites and E-commerce activities.There are many related algorithms about Web log frequent sequence patterns mining such as GSP, Apriori, PSP, G Sequence, graph traverse, FreeSpan, PrefixSpan,Disc-all, MEMISP, MFS, LAPIN-SPAM, WAP-tree, PLWAP-tree and recently NGCWAP-tree and so on. The disadvantages of PLWAP-tree: 1) the position code algorithm judging the relationship between any two nodes; 2) It will take too much time to lead to move points many times to judge relationship among nodes; 3) It will need more memory to store when the depth of PLWAP-tree or the width of PLWAP-tree is too long. The page improves under above disadvantage and the main content and results achieved are as follows:The PREWAP-tree algorithm proposed in the paper under traditional PLWAP-tree and it uses the construction of tree to store Web access sequences. The PREWAP-tree algorithm attains all frequent sequences in base of the same prefix sequence. And the PREWAP-tree algorithm builds tree by shareing nodes of each same pre-path and marks each node with the serial number by preorder traversal and pointer which links the maximum serial number of descendant node by preorder traversal while constructing its header table to judge the relationship of all nodes to reduce traversal the tree once and avoid judging the relationship between nodes by position code and then mine the built PREWAP-tree by traversal header table.The BFWAP-tree algorithm proposed in the paper under PREWAP-tree and it builds BFWAP-tree whose each node's weight of the same path increases by 1 firstly when the path appears repeatedly. The BFWAP-tree algorithm then builds header table by preorder traversal and records every node's branch at the same time. Judge whether or not the current node is the first node by its branch while mining. The event will be frequent if the collection's summary of first nodes isn't less than support. Theimprovement algorithm avoids using position code to sign each node's position and memory consumption reduces greatly when the quantity of data is large. It's easy to see that BFWAP-tree has a great improvement both in time and storage consumption than PREWAP-tree.In the end, above both improvement algorithms are right, reasonable and efficient and improve than PLWAP-tree, WAP-tree and NGCWAP-tree in time and space much.
Keywords/Search Tags:data mining, Web log mining, frequent sequence patterns, WAP-tree, PLWAP-tree
PDF Full Text Request
Related items