Logs are important information on recording the interaction between components within a software system,reflecting the operation status of the system and playing an important role in the management and maintenance of the system.As the scale of the system continues to expand and the number of logs explodes,the traditional method of manually checking logs alone can hardly meet the current requirements,and automated analysis of logs has been regarded as one of the key technologies for system operation and maintenance.Log template extraction is a key part of automated log analysis,which processes unstructured and semi-structured logs and converts them into a structured form for further analysis.Current research scholars have carried out studies on log template extraction methods from different perspectives,and log template extraction methods based on deep learning have become mainstream.However,the existing work has the following limitations: 1)the existing log template extraction methods require a large number of manually labeled log datasets for model training,which cannot fully utilize the performance of the deep network when the training data is small; 2)the model discriminates whether each word in the log is a template word and outputs a sequence of word labels,which needs to rely on additional postprocessing to generate log templates.To address the above problems,this thesis focuses on the full utilization of pre-trained models for deep networks,proposes a log template extraction method based on pre-trained models,and uses end-to-end networks to simplify the log template extraction processing,which improves the learning speed of models and the accuracy of log template extraction,and reduces the time consumption of model training and prediction.The main work and contributions of this thesis are as follows:To address the problem that existing log template extraction methods are limited by the size of the log training set and cannot fully utilize the performance of the deep network,a log template extraction method based on the LUKE pre-trained model is proposed,which uses the LUKE model as the text representation model for parameter initialization and optimizes the transformation of each word in the log with its vector representation; in order to fully obtain the contextual representation between words,a template-aware self-attention mechanism is proposed,and a CRF model is used to discriminate template words based on the contextual representation.The experimental results demonstrate that the proposed LUKE-based log template extraction method has better convergence than the existing methods,achieves higher log template extraction accuracy on a smaller log training set,and reduces the training and prediction time consumption of the model under the same conditions.To address the problem that existing log template extraction methods rely on additional postprocessing to generate log templates,an end-to-end log template extraction method based on the BART pre-trained model is proposed,which uses the BART model as the text representation model to achieve the acquisition of contextual representations and the generation of log template predictions,removing the template word discrimination stage of existing methods and simplifying the processing of log template extraction; for the processing problem of Out-Of-Vocabulary words in the log template extraction task of the BART model,an optimization method based on the Pointer-Generator model is proposed to restrict the vocabulary of the BART model for log template generation.The experimental results demonstrate that the proposed BART-based log template extraction method is similar to the LUKE-based log template extraction method in terms of effectiveness,with a slight loss in the learning speed of the model,while it further reduces the prediction time consumption.Based on the proposed two log template extraction methods,this thesis designs a log template extraction system based on the pre-trained model to realize the functions of log collection and storage,log template extraction,log query and display.This thesis applies the system to a laboratory Io T platform project and verifies the feasibility of proposed methods and the usability of the system,which contributes to the automated log analysis. |