| With the increasing complexity of modern software,the development process becomes more time-consuming and error-prone.How to improve the efficiency of development and reduce the burden on the developer is a concern in the field of software engineering.Code generation is important to research in the field of artificial intelligence and software engineering.It is expected to solve time-consuming and challenging problems in program development.Database operation code is a common type of code in the development process.Many research papers are using deep learning technology to generate related code.However,these papers mainly focus on the generation of declarative SQL statements,which are targeted at end-users rather than developers,making it difficult to increase efficiency in real development scenarios.To solve the above problems,this thesis designs a database-related code generation module from two aspects: data type and generation method,starting from the actual developmentoriented scenario.In the data type,this thesis uses a program with syntax embedded style as the target code,which contains both declarative SQL statements and SQL external programs.This type of data is different from the end-user scenario where existing SQL generation research is focused on SQL statements.It is oriented to the operation of the database in the development scenario and is used to assist programmers to improve development efficiency.The generation method adopts direct generation and indirect derivation for code generation.Among them,the direct generation approach is to directly generate code according to the input natural language,and the indirect derivation approach is to automatically modify the old code and generate new code according to the update of the input natural language.In the direct generation mode,the program that embeds SQL statements into Python code is used as the target program to construct the code data set Lyra with syntax embedding style.For the Lyra dataset,the encoder-decoder framework based on the transformer model is selected for the experiment,and the pre-training models of BERT-style and GPT-style are used to improve the experimental performance.In the indirect derivation method,the XML configuration content related to SQL under the My Batis framework is selected as the target program,and the code derivation dataset Twin XSQL is constructed.In the code indirect derivation task,the GPT-2and Code GPT is used as the basic model,and the Code GPT-XESQL model is pre-trained according to the Twin XSQL characteristics to improve the experimental performance.Finally,two datasets with syntax embedded style are constructed and experimented with.The experimental results show that the dataset constructed in this thesis meets the requirements of the development scenario,and the code generation module designed can generate the required database operation-related code.Using direct and indirect derivation,code generation can also be adapted to different scenarios.Among them,the AST exact matching rate in direct code generation reaches 25.5%,the accuracy rate of top 1 of code derivation task is 6.84%,and the accuracy rate of top 5 is 11.47%. |