Font Size: a A A

Research On Log Analysis Technology Based On Time Series Data Mining

Posted on:2019-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:C WangFull Text:PDF
GTID:2348330542987615Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The log data produced by various operating systems,applications,equipments and so on contain abundant information.Mining and analysing log data can not only help managers to find out and avoid the potential risks,find the root cause of security incidents,but also can dig out more potential information behind the log data.With the increasing attention of the log data,how to effectively excavate and analyze log data has become the current research hotspot.Log data has the nature of time series,which can be mined and analyzed more effectively by using time series data mining technique.Sequential pattern mining as one of the important research content of time series data mining,can mining frequent sequential patterns in the log data,and the frequent sequential patterns tend to reflect the correlation between log events,which has important significance and value.In this paper,two widely used algorithms including GSP algorithm and PrefixSpan algorithm for sequential pattern mining are improved.The main research work is as follows:(1)Aiming at the shortcomings of GSP algorithm,for example,the need to traverse the entire sequence set,repeatedly traversing k frequent sequences and traversing k frequent sequences slowly,an improved GSP algorithm based on sequence set optimization and index prefix tree is proposed.The algorithm is based on the sequence set optimization method proposed in this paper and the data structure of the index prefix tree designed in this paper,which effectively reduces the running time of the algorithm.The contrast experiment results show that the running time of GSP algorithm based on sequence set optimization and index prefix tree is significantly lower than the unimproved GSP algorithm,but it will costs much more space cost.(2)Aiming at the shortcomings that PrefixSpan algorithm occupies a large amount of memory,an improved PrefixSpan algorithm based on the suffix index is proposed.The contrast experiment results show that the memory usage of the improved PrefixSpan algorithm based on suffix index is significantly lower than the unimproved PrefixSpan algorithm at runtime,and its running time is close to the unimproved PrefixSpan algorithm.(3)In order to verify whether the non-frequent items contained in the suffixes recorded by the PrefixSpan algorithm will affect the time performance of the algorithm,a PrefixSpan algorithm based on projection database optimization is proposed.The contrast experiment results show that the suffix contains non-frequent items will not reduce the time performance of PrefixSpan algorithm,delete the non-frequent items in the suffix instead takes a lot of time.At the same time,this research proves that the PrefixSpan algorithm based on suffix index use suffix index will not reduce the time performance.(4)Compared the GSP algorithm,PrefixSpan algorithm and their improved algorithm respectively,the characteristics of the two kinds of algorithms are discussed in depth,and the range of use of each algorithm is explained.
Keywords/Search Tags:Log analysis, Time series, Data mining, GSP algorithm, PrefixSpan algorithm
PDF Full Text Request
Related items