Font Size: a A A

Sentence Classification Based On Distributed Representation

Posted on:2018-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:W WangFull Text:PDF
GTID:2348330518495430Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In recent years, the deep learning technology in the field of NLP has a huge development. Many important NLP task-related technologies have made great breakthroughs, such as language model, machine translation,QA, Chinese word segmentation and so on. Compared with traditional shallow learning such as Logistic Regression, SVM, etc., the deep learning model has a stronger ability to express. The reason why deep learning technology has a rapid development in the field of NLP is closely related to the maturity of the distributed representation method.Words and sentences are the basic units of natural language. Many important tasks of NLP can actually be decomposed into word-level tasks or sentence-level tasks. Therefore, the distribution of good word-level representation and distribution of sentence-level representation of the simplified model and enhance the effect of the mission can play a key role. The distributed representation of word-level indicates that there is a great deal of excellent work in the last two years. Nowadays, word vector technology performs well in basic, instrumental and migratory aspects. In contrast, sentence-level distributions indicate that the study is also relatively focused on supervised learning, as well as task-specific modeling. Although the model for the specific task design, can achieve good results, but the migration is not perfect. The unsupervised learning of the distribution of the sentence that, and migrate to the various specific tasks to become an important and meaningful research.In this thesis, the author's main research problem is to obtain the distribution of sentences by unsupervised learning method and apply them to specific sentence-level classification tasks such as emotion analysis and relational classification. At the same time, according to the task itself, The distributed representation and supervised approach will be combined to enhance the performance of the task.Based on the above problems, the research work and achievements in this thesis are as follows:1. A supervised convolutional-recurrent neural networks model is proposed. In the existing multi-window convolution network, a bi-directional cyclic network layer is introduced to adaptively extract variable-length patterns. The accuracy rate of 7% higher than the previous best results was obtained on MR, SST-1, SST-2 and other public sentiment analysis data sets.2. Based on the unsupervised self-encoder technology, a self-encoder model based on convolutional neural network,cyclic neural network and convolution-recurrent neural network is designed. Unsupervised modeling from word vector sequence to sentence distribution representation is realized. The obtained sentence distribution representation is applied to the sentence classification task, and the effect obtained in the task 1 is not weaker than the end-to-end supervised model.3. A semi-supervised model of auto-convolutional-recurrent neural networks is proposed by combining the supervised model with the unsupervised model, which effectively prevents the existence of over-supervised model. 1% to 2% improvement in the results described in 1 above.
Keywords/Search Tags:Neural Networks, distributed representation, auto-encoder, sentence classification
PDF Full Text Request
Related items