Font Size: a A A

The Strategy Of Text-to-SQL Based On Neural Network

Posted on:2022-10-24Degree:MasterType:Thesis
Country:ChinaCandidate:H J WangFull Text:PDF
GTID:2518306482989489Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Text-to-SQL is a task of translating utterances to SQL queries,and is a sub-task of semantic analysis which is a subfield of natural language processing.This dissertation pays close attention to the cross-domain context-dependent text-to-SQL task,which requires the model to generate corresponding SQL queries based on the interaction historical information and current utterance when the databases of the training set and the validation set are completely inconsistent.This dissertation first proposes the basic context-dependent sequence-to-sequence model called BCSQL for the cross-domain context-dependent text-to-SQL task.BCSQL introduces the interaction-level encoder to obtain the interaction historical information,and adds the schema information to the decoder to solve the cross-domain problem.In addition,BCSQL uses the attention mechanism in the decoding stage to determine which parts of the utterance and database should be paid attention to at each step of decoding.Although BCSQL has achieved a certain improvement in performance compared to the benchmark model,there is still much room for improvement.This dissertation proposes Guide SQL with a guide mechanism based on BCSQL,the guide mechanism uses a pruning algorithm in the decoding stage to delete columns and tables that are not related to previous predictions to effectively avoid table-column matching and tabletable foreign key connection errors.In order to improve the prediction accuracy of tables and improve the performance of the guide mechanism,this dissertation proposes a re-ranking mechanism,which generates and sorts the corresponding SQL queries for the five tables with the highest predicted probability to obtain the SQL query that best matches the current utterance.In addition,this dissertation designs a type linking based on average word embedding in Guide SQL to enhance the information of the utterance and the schema to increase the correlation between them.In order to enable Guide SQL to pay attention to the previous turn of SQL query,this dissertation adds previous SQL query attention to Guide SQL.This dissertation proposes PG-GSQL based on Guide SQL so as to further improve the performance of Guide SQL.PG-GSQL introduces the pointer-generator network to replace the decoder of Guide SQL,and the pointer-generator network includes a pointer and a generator.The pointer is used to copy tokens from the previous SQL query,and the generator is used to generate new token from vocabulary.Experiments prove that the pointer-generator network can effectively capture the historical information of SQL query and reuse the previous SQL query.In addition,PG-GSQL uses type linking based on LSTM instead of type linking based on average word embedding to achieve better performance.On the challenging context-dependent cross-domain text-to-SQL benchmark SPar C,PG-GSQL obtains 37.4% question matching accuracy and 20.2% interaction matching accuracy on the validation set.When using BERT with fine-tuning to augment the word embedding,PG-GSQL obtains 53.1% question matching accuracy and 34.7% interaction matching accuracy on the validation set,outperforms the current state-of-the-art model by 5.9% question matching accuracy and 5.2% interaction matching accuracy.
Keywords/Search Tags:text-to-SQL, natural language processing, guide mechanism, re-ranking mechanism, type linking, pointer-generator network
PDF Full Text Request
Related items