Font Size: a A A

Research On Student Program Algorithm Recognition Based On Program Vector Tree And K-means Clustering

Posted on:2022-11-24Degree:MasterType:Thesis
Country:ChinaCandidate:M WeiFull Text:PDF
GTID:2518306779475804Subject:Theory of Industrial Economy
Abstract/Summary:PDF Full Text Request
Algorithm recognition is the recognition of equivalent algorithm semantics in an automatic way.It is an effective method to evaluate algorithm behavior and program function,and an important sub-topic in the software engineering.It is used in program comprehension,duplicate code inspection,plagiarism detection,program diagnosis and verification.Not only in the industry,but also in the computer programming education,it is urgent to explore algorithm recognition theory and technologies.A programming problem haves many similar codes,the significant difference of the codes is different algorithm logic,identifying these codes is helpful to give students algorithm inspiration.However,it is difficult to distinguish program differences from many student programs manually.Using model recognition algorithm can improve recognition accuracy and save labor cost.Therefore,the thesis takes "student program algorithm recognition" as the research focus,combines program analysis and clustering,and thinks deeply about how to effectively extract program features to complete algorithm recognition.The details are as follows:1.The lexical and grammatical features of programs are extracted using lexical and grammatical analysis techniques.The biggest difference between programming language and natural language is that the former has strict grammatical norms.Therefore,the primary task is to extract and process program lexical and grammatical features accurately.To obtain lexical information,the program is first converted into Token sequences,key lexical elements are retained,and variable standardization operations are performed.Then,the AST of the program is generated by JDK to obtain the syntax structure information,which provides support for the next step of feature fusion and vector coding.2.A program vector tree is constructed to fuse various program features and optimize the tree structure to obtain high quality program features.There are some limitations in the research of program comprehension,for the most part,it only captures the program feature from a single level,or fails to integrate many aspects of program features.Word2 vec is used to generate vector representation to obtain lexical semantic information,and AST is used to construct program vector tree to fuse the lexical and syntactic features,and further remove useless nodes to optimize the unified tree structure.3.Based on the program vector tree,recursive autoencoder model is used to complete the vector coding,and k-means clustering is used to realize the recognition of the same algorithm program.To avoid the similar program detection of complex calculation based on the tree,by using recursive autoencoder to generate the vector representation,the tedious problem of solving tree similarity is transformed into the clustering problem of similar vector,and the programs using the same algorithm are divided into the same category according to the similarity,so that the algorithm recognition has higher efficiency.4.Comparative tests were carried out to evaluate the algorithm recognition results.To verify the validity of the word vector representation,PCA was used to get a lower dimensional vector,and the visualization results were analyzed.The result shows that the Word2 vec model can learn lexical semantic information.The thesis compares and evaluates the clustering effect of four different methods of program vector generation on ARI,AMI and FMI.The experimental results show that the recognition accuracy based on program vector tree and clustering can reach about 83%.
Keywords/Search Tags:Algorithm Recognition, Student Program, Program Vector Tree, Program Analysis, Clustering
PDF Full Text Request
Related items