Font Size: a A A

Modeling And Learning Of Representations For Natural Language Sentence-level Structures

Posted on:2017-01-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:M YuFull Text:PDF
GTID:1108330503469769Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Representations learning methods learn low-dimensional and smooth feature representations for natural processing tasks; as a result they can alleviate the data sparsity problem faced by the supervised learning methods for these tasks. Recent research focuses on learning representations for language structures such as phrases and syntactic trees. However, the above methods are only able to handle specific types of structures;and are usually not expressive enough due to their restriction on usage of linguistic annotations. Based on these observations, this work aims to better compose structure representations by utilizing both linguistic features and word representations. To achieve this,we propose new models for structure representations, as well as a new training strategy for these models. The applications of above structure representations were also studied in this thesis.The key to our research is to represent the conjunction among critical information of language structures. First, in order to show the importance of the above idea, this work starts with the simplest structure, n-grams. Having noticed the fact that an n-gram is the conjunction of its component words, we demonstrate that the conjunction among word representations is critical to n-gram representations. Existing word-embedding based works represent n-grams as the concatenations among word embedding vectors, which,as shown by the analysis in this thesis, is not able to reflect the conjunction between component words. Our solution, instead, first performs clustering on word embeddings,and represents n-grams as the conjunction among the discrete clusters. Experiments confirmed that our method can get better results than the concatenation one does, showing the importance of the conjunction information. We further demonstrate that our n-gram representations can also significantly help the noise removing task.Second, this thesis proposed a general model for structure representations. For arbitrary types of input structures that can be represented as a graph with words as nodes, our model can build structure representations for them. The model first decomposes the input structure into sub-structures, each of which contains some word nodes(represented as word embeddings), and some edges associated with structural information(represented as linguistic features). Our model composes sub-structure representations by exploiting the conjunction between above two types of information with outer product. We can further sum over all the sub-structure representations to get the representation for the original input structure. Based on above structure representation, our model uses a parameter tensor to get the outputs for the target task. The model is named as Feature-rich Compositional Embedding Models(FCM). This thesis mainly focuses on the applications of FCMs on sentence-level structures, and has proved that the model can achieve state-of-the-art on several relation extraction tasks.Third, when the above FCM is dealing with sub-structures with many word nodes, or when one view of the tensor has high-dimensional inputs, the model will have too many parameters and will be easy to over-fit. This thesis approximate the FCM with low-rank tensor approximation methods. The resulted model is named Low-Rank FCM(LRFCM).The result is that the input on each view will be mapped to a low-dimensional vector,making the parameter space reduced. Moreover, with the CP-form tensor approximation method, we can approximate the time-consuming tensor multiplication as the point-wise multiplication among the above low-dimensional vectors. Above improvements greatly reduced both the parameter complexity of the FCM and the runtime, as well as achieving better performances on several NLP tasks.Fourth, this thesis propose a joint-training method to better train the above models with both labeled and unlabeled data. Compared to the traditional pipeline method, this method is better at training embeddings of words not covered by the labeled data. In order to make use of the unlabeled data, we propose a training objective based on the idea of language modeling, which use the representation of a structure to predict its context words.Finally, this thesis take the learning of phrase embeddings as an example, to demonstrate the advantage of our FCM(and its low-rank approximation) and the joint-training method. We achieved significant improvements on several phrase similarity tasks with the above methods.Experiments show that the proposed methods can achieve state-of-the-art performance on several NLP tasks: relation extraction, phrase similarity, sequential labeling and cross-lingual projection. Meanwhile, this research also provides new methodologies and perspectives to representation learning research in the future.
Keywords/Search Tags:representation learning, tensor models, semi-supervised learning, natural language processing, deep learning, relation extraction
PDF Full Text Request
Related items