Joint Learning Methods For Distributed Representations Of Natural Language

Posted on:2017-03-11

Degree:Doctor

Type:Dissertation

Country:China

Candidate:F Tian

Full Text:PDF

GTID:1108330485951534

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Distributed Representations of Natural Language refers to the technology of lever-aging deep neural network to obtain the distributed representations vectors for natural language objects, such as words, phrases, sentences, paragraphs and documents. Gener-ally speaking, distributed representation vectors are those low-dimensional, dense real vectors learnt from large scale unsupervised text corpus. These vectors convey seman-tic information for natural language objects, thus can act as an effective representations for natural languages and be applied to many natural language processing tasks, which has led to quite excellent empirical performances.In this thesis, different with traditional methods that learn distributed represen-tations of natural languages totally from scratch, we try to involve more information into such a learning process, with the goal of jointly training the distributed represen-tations. Such information can be either the information outside of text corpus, such as those contents stored in dictionary and Knowledge Graph, or other abstract/high level information beyond the simple words/sentences occurrences in text corpus, such as the words polysemy information and topic information. On one hand, such a joint train-ing methodology will improve the quality of distributed representations with the help of more information, on the other hand, the distributed representations of natural lan-guages will benefit the corresponding tasks, such as topic models, and lead to better performances. To be more concrete,1) To break the limitation of traditional word repre-sentation models which assume words as the basic embedding units, we propose a joint training method which combines the words polysemy information and word representa-tions such that multiple meaning of a polysemous word are more accurately modelled. In addition to show the method leads to good empirical results, we also introduce a large scale distributed system based on the propose method; 2) We leverage the joint train-ing of word representations with Knowledge Graph representations to break the limi-tation that traditional text-driven word representations cannot express the complicated knowledge relationships; 3) Based on the two methodologies, we propose an automatic method of completing verbal IQ Test tasks, whose performance has exceeded the av-erage score of human participants in the verbal IQ Test; 4) Furthermore, we propose a Recurrent Neural Network based algorithm which jointly models sentence represen-tations and topic models. The topic model generated by such an algorithm thoroughly takes sequential properties of words into consideration. Compared with traditional top-ic models, the proposed topic model yields better performances both in quantitative and qualitative tasks.

Keywords/Search Tags:

Natural Language Processing, Neural Network, Deep Learning, Distribut- ed Representations, Embedding Vectors, Joint Learning, Word Disambiguation, Knowl- edge Graph, Topic Model, Representation Learning

PDF Full Text Request

Related items

1	Study On Efficient Representation Learning Algorithms Of Natural Language
2	Domain Entity Disambiguation And Link Prediction Based On Representation Learning
3	Research On Joint Learning Of Topic And Embedding Model
4	Modeling And Learning Of Representations For Natural Language Sentence-level Structures
5	A Research On Text Vector Representations And Modelling Based On Neural Networks
6	Research On Sentence Classification Based On Deep Learning And Feature Embedding
7	Chinese Word Embeddings Based On Neural Network Approaches
8	Research On Machine Learning For Natural Language Processing And Transmission
9	Research On Word Representation Optimization Combining Dependency Relation And Quantification Of Semantic Contribution
10	A Representation Method Of Chinese Characters And Words Based On Word-Character Alignment