Font Size: a A A

Semantic Analysis Of Statistical Literature Domain Based On Word2Vec-LSTM Model

Posted on:2020-08-11Degree:MasterType:Thesis
Country:ChinaCandidate:W F ZhangFull Text:PDF
GTID:2428330602966740Subject:Financial and risk statistics
Abstract/Summary:PDF Full Text Request
Semantic parsing is a technique for transforming natural language into a computer-implemented canonical representation.As a branch of artificial intelligence,it is a hot issue in the field of natural language processing,and also the core technology of intelligent system research,providing a basis for machine translation,intelligent search,human-machine dialogue and other applications.With the continuous development of science and technology,the application scope of the database is more and more wide,not only applied in the production of enterprises,but also many individual individuals and research institutions have begun to apply database technology.Mastering the SQL language is a prerequisite for using the database.For those who are not familiar with the SQL language,the data in the database does not play its maximum value.Therefore,in order for the data in the database to serve more users,the boundaries between natural language and SQL language must be broken,and the natural language can be successfully converted into a computer-executable database language,and semantic analysis plays a role in the realization of this goal.Crucial role.Since the field of statistics is more complex than the encyclopedic field,and there is a lack of public data sets for the statistical field,there are relatively few semantic analysis studies for the statistical profession.Based on the above reasons,this paper takes the natural language query interface of the database as the application scenario,improves the existing LSTM-based English semantic analysis model,and applies it to the semantic analysis task in the statistical field,and transforms the natural language problem into the database query language..Help more users to query the database information without relying on the SQL language,enhance the sharing and practicability of the data,and thus expand the application scenarios and scope of the database.Main work includes:(1)In view of the professionalism of text data in professional fields,this paper introduces the Word2Vec word embedding model into the existing LSTM semantic analysis model.With the Word2Vec word embedding model,the text data is vectorized,the feature extraction and representation of the text is realized,and the word vector for semantic analysis of the professional domain is constructed to reduce the influence of professionalism in the statistical field.(2)An NL2SQL data set dedicated to the field of statistics was constructed to make up for the lack of public data sets in the statistical analysis of statistical fields.In the dataset,each piece of data consists of two parts,a natural language problem and a corresponding SQL query statement.(3)The Word2Vec-LSTM semantic analysis model incorporating word embedding is implemented,and the statistical domain data set and word vector constructed in this paper are used as the training data for empirical research.In the training process,the two parts of Batch-size and activation function in the model are carried out.Tuning,ultimately transforming natural language into SQL statements.The results show that the semantic analysis model of the deep neural network with word embedding can be better applied to the professional statistical journal data.After comparative study,it is found that the model prediction accuracy when the block size=26 and the activation function select PReLU The highest,compared to the LSTM-based semantic parsing model,increased by 7.3%.
Keywords/Search Tags:?Journal of the American Statistical Association?, Semantic analysis, LSTM, SQL, Word Embedding
PDF Full Text Request
Related items