Font Size: a A A

Research On Automatic Program Grading Based On Code Implicit Semantic Feature Representation

Posted on:2022-06-15Degree:MasterType:Thesis
Country:ChinaCandidate:J XuFull Text:PDF
GTID:2518306539969419Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Automatic program grading means that the computer automatically evaluates the source code of the program written by the learner,which not only frees the teacher from the boring and repetitive code evaluation work,but also speeds up the feedback speed and provides fair and objective feedback results,which is beneficial to the learners quickly Improve programming level.At present,most automatic code evaluation systems are based on dynamic code analysis,that is: first compile and link the source code,then select multiple sets of pre-prepared data to input into the running program,and give the final program based on the degree of compliance of the running results Score.This obviously does not meet the scoring criteria for programming questions,because a program code with completely correct logic but wrong output result will be rated as 0 points,or a program code without logic but correct output result will be rated as full score.Code static analysis is the analysis of code grammatical structure and semantic information.Automatic code evaluation based on code static analysis conforms to manual scoring standards and habits.We mainly completes the following work in the automatic code evaluation based on code static analysis:(1)In order to make the form of the code have grammatical structure information,firstly construct an abstract syntax tree of the code.At the same time,in order to facilitate subsequent model training and avoid the high computational complexity caused by direct training on the syntax tree,it is proposed to serialize in a pre-ordered traversal method.Abstract syntax tree,through word segmentation of tree nodes,finally forms a word sequence with grammatical information.(2)Aiming at the word sequence of the code syntax tree,using the LSTM network,an automatic code evaluation framework based on the code implicit semantic feature representation fusion sliding window and hash function is proposed.The framework is divided into two stages: code hidden feature learning and code automatic evaluation.In the code hidden feature learning stage,the model applies sliding windows and hash functions:sliding windows are based on the principle of program locality,and when trained by the LSTM network,local lines of code can be semantically mapped to a certain type of code implicit semantic expression;The introduction of the hash function can reduce the dimensionality of the output vector of the LSTM network,and finally obtain the implicit semantic representation of the hash code of the code.In the automatic evaluation stage of the model,input the learned code hidden hash code vector to the KNN module,find out K copies of the code with score tags similar to the hidden hash code vector,and obtain the final input code through weighted average Score,put the code into the code base,and use it as the training set data to continue training the model.The experiments on the self-collected code data set verify the feasibility of the automatic code evaluation framework proposed in this paper and the rationality of the evaluation results.(3)In order to further improve the accuracy and effectiveness of automatic code evaluation,this paper proposes an automatic code evaluation framework that integrates code quality measurement features based on the introduction of code quality measurement features.Code quality measurement features include six features,including the number of feasible paths in the program control flow graph,the number of infeasible paths in the control flow graph,the average number of nodes in the feasible path of the control flow graph,and the average number of nodes in the infeasible path in the control flow graph.According to the code syntax tree,the control flow graph is constructed,and the detection algorithm is updated through the numerical interval domain to obtain the code quality measurement feature,and then it is spliced and fused with the code hidden hash code vector to form the final implicit semantic feature vector representation of the code.Comparative experiments show that the introduction of code quality measurement features can further improve the accuracy and effectiveness of automatic code evaluation.
Keywords/Search Tags:Automatic program grading, Abstract syntax tree, Code implicit semantic feature, Sliding window, learning hash, Code quality measurement feature
PDF Full Text Request
Related items