| Code smells may increase the risk of failure,make software difficult to maintain,and introduce potential technique debt in the future.The supervised learning model has become increasingly important in code smell detection.Existing works put efforts into optimizing the approaches from datasets,models,etc.Although the accuracy is continuously improved,researchers are still confused about which model can obtain the best performance due to the different approaches to dataset construction and feature selection.In addition,there is a lack of systematic analysis and classification summary of code smell detection based on supervised learning.To this end,this thesis collects 78 papers on code smell detection based on supervised learning published from January 2010 to October 2022 and empirically studies eight research questions.RQ1-RQ7 analyzes and summarizes dataset construction,data preprocessing,and model selection.RQ8 conduct experimental comparisons between the supervised learning models from the perspective of empirical research.Finally,we summarize the problems of code smell detection based on supervised learning and suggest possible research directions.To address the existing problems in current research,this thesis proposes a novel approach named Dele Smell to detect code smells based on a deep learning model.And a dataset containing more than 200,000 samples from 24 real-world projects is constructed to support the training of the model.To increase the number of positive samples,a refactoring tool is developed to automatically transform non-smelly items into smelly items in realworld projects.Dele Smell extracts both structural features by i Plasma and semantic features by latent semantic analysis and Word2 vec.Dele Smell builds a deep learning model,including a CNN branch and gate recurrent unit-attention branch.The final classification is conducted by an SVM.In the experiments,the effectiveness of the automatic refactoring too and semantic feature extraction technology,and the application of the tool in the actual program are evaluated.Moreover,the tool is compared with the widely used supervised learning models and MARS methods.The experimental results show that the accuracy of Dele Smell method is higher than that of the existing methods,which verifies the effectiveness of the proposed method. |