| In today’s large-scale computer systems,traditional manual error localization methods have become almost impractical for handling abnormal situations due to the enormous scale of systems and their complex operational processes.Therefore,a more efficient and accurate method for system status detection is needed.Log data,which records system operation processes,has become a mainstream method for system status detection.However,the challenge lies in effectively mitigating the interference caused by unstable logs,which have become increasingly complex.The instability of log data is primarily evident in two aspects:(1)Normal states in real environments outnumber abnormal states,causing a small amount of abnormal data to be easily obscured within a large amount of normal data,leading to category imbalance.(2)Due to system maintenance and upgrades,log template formats are subject to changes and iterations,which confuses the running detection model regarding unknown log templates.To address these issues,this dissertation proposes a contrastive learning-based state detection system and a log dynamic word embedding-based state detection system.(1)To tackle the issue of imbalanced data categories in log environments,this dissertation introduces a novel contrastive learning-based system status detection framework,Contra Log.Designed to address the limitations of existing methods in dealing with imbalanced data categories,this dissertation utilizes an innovative anomaly detection model and a new hybrid loss function,which combines crossentropy and contrastive loss functions.In creating positive sample pairs for contrastive learning,the neural network’s inherent features are leveraged by inputting the same sample twice into the model and dropping out a pair of positive samples.This hybrid loss function design enables the classifier to better distinguish between different category samples,thereby improving the detection model’s accuracy.Experimental results on the HDFS dataset demonstrate that Contra Log achieves the best accuracy,recall,and F1-score results,with an accuracy rate of up to 99%.On the BGL dataset,Contra Log also excels in accuracy and F1-score,reaching an accuracy rate of 99%.Visualization analysis of log data indicates that Contra Log’s hybrid loss function makes distinguishing between different categories easier compared to using the cross-entropy loss function alone.(2)Concerning potential issues with unfamiliar log templates due to system maintenance and upgrades,this dissertation proposes LogEvoBERT,a novel system state detection framework utilizing dynamic log word embedding,which primarily focuses on the vectorization processing of log data.LogEvoBERT introduces a new log template vectorization method.Initially,log data is fed into the Embedding module for unsupervised contrastive learning training,comprised of the BERT model and a text feature extraction model.The BERT model inputs the generated word vectors into the text feature extraction model for further feature mining,then the extracted feature vectors are concatenated with sentence vectors extracted by BERT to form a more comprehensive feature representation.Ultimately,these fused feature results are inputted into the classification model for log data category judgment.On both the HDFS and BGL datasets,LogEvoBERT achieves the best results in terms of accuracy,recall,and F1-score,with an accuracy rate of up to 99%.The research results of this dissertation broaden the adaptability of log-based system status detection models under adverse log data conditions and provide novel insights for future research on log-based system status detection models,offering a wide range of potential applications. |