Research On Distributed Representation Based On Bigram

Posted on:2018-05-10

Degree:Master

Type:Thesis

Country:China

Candidate:C Y Ma

Full Text:PDF

GTID:2348330518495431

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

In the field of natural language processing, words and sentences are the most basic units of representation. Word is an abstract representation,often containing multiple meanings, the relation between different words are also different. Sentence can be regarded as word sequence, with a specific syntax structure, connotation and more abundant. The objective of distributed representation research is to assign appropriate vector representations to each word and sentence, and to serve for tasks such as subsequent information retrieval and semantic mining.The choice of language model is the basis of distributed representation research. At present, n-gram language model is adopted in the distributed research method based on neural network. Based on the independent hypothesis of text condition, the n-gram model can be simplified as bigram model, reduce the parameter space and solve the data sparsity problem. In this paper, an improved method of distributed representation based on bigram language model is proposed, which integrates the positional information and syntactic dependency information into the distributed representation. At the same time, the construction of Chinese relational data set is completed. The main research contents and results are as follows:First, in terms of the research of word distribution representation, a method of word distribution representation based on location information is proposed. This paper argues that the existing weighting methods in the dynamic window method can not reflect the relation between words through the artificial setting. So two dynamic window weight improvement schemes are proposed. The first is the adaptive weighting factor method, and the different weighting factors are studied for different corpus. And a weight vector method based on the KL divergence to compute its own weight vector for each target word. In the word similarity and semantic, grammar assessment indicators, have significantly improved.Second, a Chinese relation extraction data set is constructed. In this paper, a weak supervised and semiautomatic Chinese relation extraction dataset construction method is proposed. With the aid of Wikipedia,sogouCA news corpus and Baidu API, weak supervised sentence extraction is achieved, and semantic annotation is realized by cyclic neural network. Finally, . The data set was selected as the corpus of Chinese tendency analysis and evaluation (COAE) task, which played a role in the development of Chinese relation extraction.Thirdly, for the research of the distributed representation of sentences, this paper proposes an improved algorithm for relational extraction based on dependency paths. By using dependency syntax analysis, the input structure of neural network is changed. In this paper, a series of experiments are compared and found that the traditional natural language processing features into the neural network structure is very effective.

Keywords/Search Tags:

distributed representation, bigram, position weight, dataset, relation extraction

PDF Full Text Request

Related items

1	Research On Relation Extraction Based On Semantic Weight And Attention Network
2	Research And Implementation Of Relation Extraction Based On PLSTM-CNN And Shared Representation Generator
3	Research Of Distantly Supervised Relation Extraction Based On Neural Network
4	Research On Chinese Named Entity Semantic Relation Extraction Based On Dependency Tree
5	Research On Entity Relation Extraction Method Using Pre-training Language Model And Knowledge Representation
6	Using Distant Supervision And Representation Learning For Entity Relation Extraction
7	Research On Few-Shot Relation Extraction Methods
8	Research And Application Of Entity Relation Automatic Extraction Algorithm In Elementary Mathematical Problems
9	Research And Implementation On The Method Of Chinese Domain Concept And Relation Extraction Based On Semantic Graph
10	Research On Entity Relation Extraction Technology Based On Graph Neural Network