The relational database is a.complex technology designed by humans to store data in computer systems.Users use the database Structured Query Language(SQL)to extract content from the relational database.The semantic parsing for database question answering aims to transform human natural language problems into corresponding SQL queries.According to the number of tables in the database and rounds in the dialogue,the database question answering can be divided into three types of subtasks with the difficulty gradient,namely,single-round single-table text-to-SQL task,single-round multi-table text-to-SQL task,and multi-round multi-table text-to-SQL task.First,to solve the problem of data sparsity in the Chinese dataset,we designed and developed a data annotation system.Furthermore,with the help of the data annotation system in this paper,we annotated the largest and better multitable text-to-SQL Chinese dataset SeSQL.Based on SeSQL and other open-source datasets,this paper has conducted in-depth research on three different tasks.Finally,based on all the research contents and achievements in this paper,we build a semantic parsing platform for database question answering to meet the actual needs.Specifically,the main contents of this paper are as follows:(1)Data annotation system for multi-round and multi-table Chinese database question answeringThe difficulty of obtaining data in database question answering tasks is relatively high,so there is a significant problem of data sparsity in this task.In response to the above problems,we designed a multi-person collaborative labeling process,so that we can check the previous labeling content during the labeling process,and ensure the labeling quality of data.Base on the idea of modularization,we build a data annotation system and provide some practical functions that help the annotation process.Finally,based on the data annotation system provided in this paper,we built a large-scale,high-quality,multi-table text-to-SQL Chinese dataset SeSQL.(2)Semantic parsing of single-round question answering for single-table databaseSingle-round single-table text-to-SQL tasks are more widely needed in current business.In this task,we use the semantic parsing model with a sequence-to-set structure as the baseline.For this baseline model,we include a value encoding optimization strategy to improve SQL value-selecting process.This paper attempts to add the relation-aware encoding layer to the baseline model to further strengthen the encoding effect.The experimental results show that the value encoding strategy can help the model predict the WHERE clause in SQL,while the relation-aware encoding layer has a large phenomenon of feature vector homogeneity,which will bring a negative impact on the sequence-to-set model.(3)Semantic parsing of single-round question answering for multi-table database based on curriculum learningTo speed up the training process of the semantic parsing model and improve the training quality of the model,this paper attempts to apply the curriculum learning algorithm to the training process of the LGESQL model.This paper has designed two curriculum learning methods.Through experimental analysis,it is found that the curriculum learning method dividing samples by the structure of SQL will have a greater impact on the text-to-SQL model training process.(4)Repetition and comparison of multi-round question answering semantic parsing model for multi-table databaseMulti-round and multi-table text-to-SQL task has gradually become a hot research direction in academia.There are usually two typical design ideas for the semantic parsing model to solve this task.For two semantic parsing models with different design ideas,this paper selects EditSQL,IGSQL,and RATSQL extension models to complete the replication and analysis.We compared the three models on the Chinese dataset SeSQL.The results showed that the RATSQL extended model using only concatenating questions has a highly competitive performance level,while its code structure is relatively simple and efficient.The RATSQL extended model is more suitable for industrial production scenarios and has a large optimization potential.(5)Construction of database question answering semantic parsing platformWith the wide application of computer technology,the database question answering semantic parsing platform also has a wide range of application scenarios.Based on the previous research on the semantic parsing model,the data annotation system,and the annotated dataset SeSQL,we have built a semantic parsing platform for database question answering with the help of mainstream and advanced development technologies.After testing,the database question answering platform in this paper has stable performance and good interactivity and can meet the normal question requirements.To sum up,we have carried out relatively comprehensive research on the semantic parsing of database question answering,covering three text-to-SQL semantic parsing tasks:single-round single-table task,single-round multi-table task,and multi-round multi-table task.We have built a relatively safe,stable,and well-performing data annotation system and semantic parsing platform.Based on the research in this article,we provide a relatively complete solution and idea for the actual industrial production scenario to the database question answering task.We sincerely hope that the research and the practical tools in this paper can be helpful for future database question answering research. |