Font Size: a A A

Research And Implementation Of Tomcat Access Log Analysis System Based On Data Mining

Posted on:2012-03-28Degree:MasterType:Thesis
Country:ChinaCandidate:R ChenFull Text:PDF
GTID:2248330395487899Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Now the number of Internet access in China has been ranked first in the world, and the number of sites appears to be the explosive growth. What is more,the issues of site topology and network security are becoming increasingly serious. So web log analysis could help the network managers make the decision to solve the above problems. Tomcat is a kind of popular web server, and analysising its web log has become a hot topic.As a decision support method, Tomcat access log analysis which is used in the process of web log analysis still faces many challenges. Based on the previous studies this paper will perform an innovative research to both of the websites’real-time monitoring mode and the user access behavior analysis:developing a distributed Tomcat access log analysis system, and applying data mining techniques based on classification rules into the Tomcat access log analysis; improving the classical Apriori-All serial sequential pattern mining algorithm and then proposing a parallel algorithm for mining user access behavior.A prototype of Tomcat access log analysis system based on classification rules is developed, which is a distributed C/S architecture and can timely monitor the Tomcat servers in different locations. The system do not only improve the efficiency but also reduce costs of management. In the field of the intrusion detection and early warning, the system first produces some classification rules by data mining, and then uninterrupted scans logs and alarms by a working thread, which do solve the problem early.The user access behavior analysis algorithm applied is a Forwards Projection Apriori-all based on Grid(GFPA), which is improved by a strategy of forwards projection, and then the other algorithm proposed is a Data Parallel Sequential Pattern Mining based on Multi-processor Scheduling(MDPSP), which is based on the Data Parallel Sequential Patterns Mining algorithm(DPSP). According to the experiments to the above three algorithms, the MDPSP performs higher efficency and speedup, and balances the load.
Keywords/Search Tags:Tomcat, Visiting log analysis, Data Mining, User behavior, MDPSP
PDF Full Text Request
Related items