| The rapid development of AI technologies such as Chat GPT is changing human society and the world,becoming one of the key areas of development in countries worldwide.Among them,universal programming education and cultivating IT talent are crucial to developing AI technology.With the advancement of modern education,the scale of programming education has been largely realized,with various online learning platforms and online judging systems emerging endlessly.However,due to the lack of direct guidance and intervention from teachers,learners face two major problems during the learning process: 1)Untimely feedback: learners cannot get their problems solved when they encounter teaching materials they do not understand,such as incomprehensible codes and unsolvable bugs; 2)Poor targeting: a standardized learning path cannot meet learners’ personalized learning needs.These problems seriously affect the efficiency and enthusiasm of learners.And online learning methods are an indispensable means of programming education.Therefore,improving the intelligence and personalization of programming education is an urgent issue that needs to be addressed.In software engineering,people study code representation and downstream applications,such as code summarization and code fixing,to support the automatic development and maintenance of software.Though these applications can also assist learners in understanding code and identifying bugs,thereby improving the intelligence of programming education.However,enhancing the code representation ability of models remains a very challenging task.Additionally,programming learner behavior data contains a wealth of personal characteristics,such as knowledge mastery,learning style,and programming ability.Modeling these behaviors helps to construct programming learner profiles and achieve personalized programming education.However,limited to the quality of early programming learning data,research on behavior modeling is still in its infancy.Therefore,to promote the intelligence and personalization of programming education,the dissertation goes deeply into two scientific problems: code representation and behavioral modeling.The main contributions of this dissertation are as follows:(1)The dissertation proposes a novel code representation model based on multimodal information utilization and fusion:Inspired by the characteristics of human code reading behavior,we analyze the shortcomings of existing code representation models and propose a novel neural network that more effectively utilizes and fuses code multimodal information.To address the low connectivity and OOV problems in structural modality,we construct an upgraded structural representation called SAST and propose a statement-based partitioning algorithm to capture the fine-grained structural information in S-AST.Meanwhile,to fully learn semantics in the context modality,we introduce two external knowledge enhancement strategies to incorporate semantics and syntax.Experiments on code summarization and code clone detection tasks demonstrate the proposed model has good code representation ability and generalization performance.And a lot of ablation experiments verify that the model can effectively utilize and fuse the information of each code modality.(2)The dissertation reveals the commonalities between attention scores learned by Code Pre-Training Models(Code PTMs)and structural distance,and proposes a structural knowledge-based enhancement strategy for Code PTMs: We propose a quantitative metric,namely CAT-probing,to evaluate the relationship between attention scores learned by Code PTMs and structural distance during fine-tuning.Based on this metric,we found that the more correlated the attention scores learned by Code PTMs and structural distance are,the stronger the model’s representation ability is.Based on this observation,we propose a structural knowledge-based enhancement strategy from the perspective of multi-task fine-tuning to improve the representation ability of Code PTMs.Extensive experiments on five Code PTMs and two downstream tasks demonstrate the effectiveness of the proposed strategy.Moreover,the performance improvement is more significant for models with fewer structural learning tasks during pre-training and smaller datasets.(3)The dissertation proposes Programming Knowledge Tracing(PKT)problem in programming education,collects and annotates a new benchmark dataset,and presents a context-aware double-sequence model to address this problem: To model the mastery of programming knowledge,we propose the PKT problem.In response to the lack of knowledge concepts in the current programming learning dataset,we collect and manually annotate a new benchmark dataset Be PKT.Unlike knowledge tracing tasks in other educational scenarios,PKT needs to model learners’ programming ability and knowledge mastery at the same time.Therefore,we propose a context-aware double-sequence model.In particular,we introduce an exponential decay attention mechanism to simulate the forgetting process of learners and design a two-stage pre-training model called PLCode BERT to enhance the representation of low-quality code in programming learning.Experiments on the Be PKT dataset demonstrate the effectiveness of the double-sequence model in PKT and the strong ability of PLCode BERT for low-quality code.(4)The dissertation proposes Programming Learning Style Classification(PLSC)problem in programming education and presents a neural network model simulating the learner’s learning process to solve the problem: To understand the programming learning styles of learners,we propose the PLSC problem and model it as the sequence recommendation problem.To model programming learning preferences in a fine-grained way,we add position information and compilation results into the programming problem and code representations to enhance vector semantics.And we design a vector difference module to track the learner’s progress in continuous submission behavior and construct updatable hidden vectors for programming learning style and ability to simulate the learning process.Compared with sequence recommendation models,experimental results on the Be PKT and Code Net datasets demonstrate the proposed model can capture continuous and fine-grained programming behaviors,thus achieving better prediction performance.And we explore the correlation between programming learning styles and programming languages.In summary,the dissertation addresses two major research problems in programming education: code representation and behavioral modeling.To tackle these problems,we propose an effective code representation model based on multimodal information utilization and fusion,and a structural knowledge-enhanced scheme for code pre-training models.Moreover,we propose two behavioral modeling problems(PKT and PLSC)in programming education and present the pertinence solution for each problem.Experimental results demonstrate the effectiveness of the above methods,laying the foundation for intelligent and personalized programming education. |