Databases are important carriers for storing and transmitting massive data,but writing their query language(SQL)requires certain technical skills,and ordinary people cannot quickly construct accurate and effective query statements.Therefore,a method that can rewrite natural language into SQL statements is needed to reduce the complexity of operating databases and obtaining target information.With the wide application of deep learning,database query rewriting models based on neural networks have gradually become a research hotspot.These models learn the mapping relationship between the input natural language and the output standard query statement from a large number of data pairs,and achieve query rewriting.However,the current methods in this field are mainly divided into two types: based on template slot filling and based on graph neural network encoding,both of which have certain limitations.The method based on template slot filling needs to separately build a generation network for different SQL sub-modules,and splice the complete SQL according to the conversion results of each sub-network,which increases the network complexity and computational overhead.The method based on graph neural network encoding needs to predefine the structural relationship(edge)between nodes according to different data tables,thereby expressing different intra-schema relationships,which increases the difficulty of preprocessing and data dependency.To solve these problems,this paper proposes a Chinese database query rewriting model based on machine translation,which directly completes the input-output mapping in an end-to-end manner.The network model is relatively simple and can handle complex scenarios.This paper mainly makes the following work and contributions:(1)Firstly,constructed a special "translation" model based on a bidirectional LSTM network to "translate" natural language into SQL statements and verify the feasibility of machine translation for SQL generation.Then,based on this,we improve the model structure based on the current popular and efficient Transformer architecture to further enhance the model conversion efficiency.(2)Based on the Transformer architecture,a BERT Chinese pre-training model is used as an encoder to solve the problem of Chinese character word granularity division to encode the input.Meanwhile,a special embedding process is performed on the database table information so that the model can extract the association information between the target SQL and the table,thus improving the accuracy of the conversion results by 20%.In order to further improve the accuracy,this paper enhances the data by the idea of changing the stitching order of table information,so that the model outputs the same conversion result when facing different coded inputs of the same query,and improves the robustness of the model.(3)In order to make the output of the above model more similar to the human SQL statement generation habits,the idea of Adversarial Training is introduced,and the above model is used as a generator to build a discriminator to score the SQL statements output by the generator.In order to solve the problem that textual data cannot be differentiated everywhere,we use Policy Gradient to build a Reward from discriminator to generator.The above method further improves the accuracy of database query rewriting by about 2%. |