| In the field of software defect prediction,the classification of defect severity level determines the speed of defect repair by maintenance personnel to some extent,so it is of great significance to improve the accuracy and efficiency of software defect report severity prediction.The traditional severity prediction method has some problems,such as a large number of defects,long time,and inconsistent classification criteria for defects.With the breakthrough of deep learning technology in natural language processing,people begin to use deep learning methods to study various tasks.However,the current software defect report severity classification method will produce anisotropy in vector representation,most sentences have high similarity and poor uniformity,and the feature extraction ability under different semantic levels is insufficient,which leads to the poor generalization ability of the model.Therefore,this paper carries out research on the severity prediction method of software defect report based on BERT and feature fusion.The main research contents are as follows:In order to solve the problem that the anisotropy of sentence vector representation in the severity prediction model of software defect report is too high,a pre-training method based on BERT-cl(BERT-contrast learning)was proposed.Firstly,two MLM mask language models are used to obtain two sentence vectors with similar semantics but different codes.Secondly,the cross entropy loss function of contrast learning is used to narrow the distance of positive sample pairs and widen the distance of negative sample pairs,so as to improve the representation ability of sentence vector.The experimental results show that the correlation coefficients of BERT-cl in Mozilla and Eclipse datasets are 48.64% and 35.79%,respectively.Compared with other unsupervised semantic similarity computing models,the proposed method achieves better results.Aiming at the problems of low classification accuracy and insufficient feature extraction ability in the severity prediction model of software defect report,a SWF-BERT(Sentence-level and Word-level features Fusion-BERT)method was proposed.In this method,the feature words related to severity are first screened and the keyword vector is fused in the embedding layer.Secondly,the output vector at the level of BERT words is connected to the MCLSTM(Multi-scale CNN combined with LSTM)model.Finally,a learnable adaptive weighted fusion of sentence-level and word-level features is done.Experimental results show that the F1 values of SWF-BERT in Mozilla and Eclipse data sets are 64.86% and 59.31%,respectively.Compared with other classification algorithms,the performance of SWF-Bert is greatly improved.A software defect report severity monitoring system is constructed.The system can classify and forecast the data uploaded by users,evaluate the results and classify labels for users to query and download.Statistical analysis can be performed on the predicted data,and alarms will be issued in time for the defect report exceeding the threshold,so as to inform the administrator of maintenance and facilitate the administrator to monitor the defect information. |