Font Size: a A A

Research On Sequential Pattern Mining In Web Log

Posted on:2017-06-17Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q JiangFull Text:PDF
GTID:2348330503468038Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As an important application of data mining, Web log mining is a kind of new technology to process information in the Internet age at now time. As an important research aspect in Web log mining, sequential pattern mining is mainly used for mining frequent and orderly sub sequence in sequential database, which will bring us important significance and practical value on optimizing the Web site's structure and providing better service to the users.This paper mainly researches the users' access sequential pattern mining which based on web log.Through the analysis about the users' accessed records in the server log file,sequential pattern mining helps to find the users' frequent accessed pages or some resource model, which can help to improve the organized form of Web pages, the Web site architecture and its overall performance effectively. On the basis of an in-depth study of correlative theories and the prior art on Web mining and sequential pattern mining, we summarize the advantages and disadvantages of current sequential pattern mining algorithm,and analyze the WAP-mine algorithm that based on the structure of WAP-tree. In view of the disadvantages of traditional algorithm WAP-mine, this paper present a new algorithm MNWAP-mine, comparing with the traditional algorithm, this algorithm has better efficiency and performance in mining Web log sequence patterns.The main research contents are as follows:(1) Firstly, we improve the structure of WAP-tree, it employs an auxiliary storage structure that based on hash table to help search the sequence, which can save the time of searching. Secondly, for the lack of two complete scanning of database in traditional WAP-mine algorithm on the construction of WAP-tree, we can get the sequence that only contains the frequent items according to the result of first scanning,which will make the second scanning only contains the frequent items. The experiment on the synthetic dataset T10I4D100 K and the real dataset retail show that the two aspect of improvements made the efficiency of WAP-tree construction much better than traditional algorithm obviously.(2) On the base of the improved WAP- tree structure, the algorithm uses the method of merging frequent child nodes at the same time. When mining the access sequential patterns,it don't need to produce large sub-trees recursively comparing to the traditional algorithm WAP-mine, which can greatly reduce the mining time and improve mining efficiency. Theexperimental results show that the improved algorithm has higher efficiency and performance comparing to the existing algorithm, and is more appropriate for mining sequential patterns in Web logs.
Keywords/Search Tags:web log mining, sequential pattern, WAP-tree, WAP-mine algorithm, MNWAP-mine algorithm
PDF Full Text Request
Related items