Font Size: a A A

Research And Application Of Data Stream Clustering Algorithm In The Analysis Web Access Log

Posted on:2022-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhangFull Text:PDF
GTID:2518306494969099Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,Internet technology has made striking achievements,and its usage has become increasingly popular.In the process of using the Internet,there must be generated a large number of Web data.Therefore,how to mine valuable information from those big data,mine access behavior from log data is a hot issue worthy of research and attention.Research on this issue will help website administrators discover website security risks timely,fix website vulnerabilities,and continuously improve the network security awareness of website operation and maintenance personnel.It also helps administrators understand the content of the website that users are concerned about and update maintenance content timely,to better play the role of the website.In view of the problem of Web access log analysis,this paper makes in-depth research on the related concepts of data stream,and develops the optimizing methods for further improving the data stream clustering algorithm.Then,on the premise of comprehensively studying the basic principles of the Storm framework,designs distributed and parallel algorithm based on Storm to improve the processing efficiency of the improved data stream clustering algorithm.Finally,applies the improved distributed data stream clustering algorithm to the analysis of actual collected Web access logs.The main research contents included of this paper are as follows:(1)In this paper,the data stream clustering algorithm based on density and grid is used as the basic algorithm,and then the shortcomings of the basic algorithm(including threshold parameter setting and cluster boundary determination)are fully considered to further optimize the basic algorithm.Based on the density and grid data stream clustering algorithm designs an improved algorithm to improve algorithm's cluster performance.(2)Aiming at the problem that it is unable to efficiently handle the real-time and massive data such as web access logs in a single-machine environment,the distributed stream processing platform Storm is built,Data stream clustering algorithm is designed in parallel and distributed way,and implemented based on Storm.(3)In terms of web access log analysis,apply the distributed algorithm based on Storm,and the relevant model is designed.Experiments are carried out based on the actual collected campus website Web access logs.It indicates that this algorithm has much better clustering effect,and the parallelized calculation better adapts to the data characteristics of Web access logs.The algorithm is distributed,real-time and accurate.For website management,the results have important reference significance.It can provide the necessary reference scheme for the effective solution of related problems.
Keywords/Search Tags:Web access log, Data stream, Storm computing framework, Real-time clustering, Density and grid
PDF Full Text Request
Related items