Font Size: a A A

Geographic Entity Relationship Extraction Based On Domain Adaptive Transfer Learning

Posted on:2020-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z D ChenFull Text:PDF
GTID:2428330623967015Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Geographical domain texts contain rich unstructured geographic entities and relationships,and sophisticated geographic entity relationship extraction techniques are critical to the construction of geographic knowledge graphs.Due to the lack of geographic corpus resources,it is difficult to apply methods based on large-scale corpus such as deep learning.Based on the domain-adaptive transfer learning method,knowledge of other domains and geographic domains can be transferred to the same feature space,and the rich corpus of other domains can be used to enhance the learning effect,thus alleviate the problem of insufficient geographic corpus.Therefore,this thesis uses the domain-adaptive transfer learning method to study the entity relation extraction for the unstructured text in the geographic domain.The domain adapting process includes edge and conditional probability distribution alignment.Making the distribution close to the same means that the field adaptability is optimal.To this end,it is necessary to quantitatively calculate and balance the relative importance of the above two distributions in adapting process to avoid excessive adaptation or under-adaptation.In view of the above problems,this thesis proposes the geographic relation type system and the automatic annotating method to build the geographic corpus,and proposes a probabilistic distribution adaptive transfer learning method to improve the extraction of geographic domain entity relation.The main research and works include the following parts:1)This thesis analyzes and constructs the entity relation type of the geographic domain,and proposes the automatic corpus annotation method based on the Trie-tree,and constructs the entity relation extraction dataset to provide a data foundation.According to the unbefitting relation type system between public domain and geographic domain,this thesis analyzes the relationship types of the public datasets and constructs a ten-types relation system based on the text features of the geographic domain.The Trie-tree algorithm is often used in string matching and word segmentation systems.Based on the Trie-tree algorithm and the remote supervision hypothesis,this thesis proposes an automatic annotating algorithm for extracting datasets of the geographic relation extraction task.Compared with the plain matching method,the annotation speed is effectively increased by 2.5 times.2)based on the attention mechanism Bi-LSTM model,a domain adaptive transfer method based on advanced feature layer is constructed,and the knowledge of two domains is merged to each other by reducing the Maximum Mean Discrepancy between feature layers.In this thesis,the relationship extraction task is defined as a supervised relation classification problem.Input features include character vectors and character position features,which avoids the spread of word segmentation errors and reduces the introducing of prior knowledge.The acquisition of advanced features is accomplished by a bi-directional long short-term memory(Bi-LSTM)neural network combined with attention mechanisms,which can effectively represent the sequence information.In order to transfer the knowledge from public domain to the geographic domain,based on the advanced features,this thesis proposes an adaptive transfer learning method,by minimizing the maximum mean discrepancy(MMD)between the advanced features of the two domains,the common feature space is adapted to the both space of the public and geographic domain.Experiments show that this method improves the relation extraction effect in the geographic domain.3)An estimation method for the weight allocation of probability distribution is proposed,which adaptively adjusts the learning weights of the edge and conditional probability distributions in adapting process and improves the domain adaptability.In order to make the edge and conditional probability distributions close to the identical distribution,Based on A-distance,which can measure the difference between different probability distributions effectively,it is possible to enhance the degree of domain adapting when the difference is large,and vice versa.This method avoids complicated manual adjustment.Because the calculation of condition probability distribution is more complicated,this thesis adapts the sufficient statistics of the conditional probability distribution based on generated pseudo-labels,so as to reduce the time complexity.At the same time,for the problem that the time complexity of MMD is too high,the unbiased estimation method of linear time complexity is used to improve the computational efficiency of the domain adaptive transfer.The final experiment shows that the probability distribution adaptive method achieves better results.
Keywords/Search Tags:Geographic Domain Entity Relation Extraction, Domain Adaptive, Transfer Learning
PDF Full Text Request
Related items