Font Size: a A A

Research On Semantic Relation Acquisition Based On Encyclopedic Corpus

Posted on:2020-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:W Y ZhouFull Text:PDF
GTID:2428330590451089Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Information extraction is an important branch of natural language processing,including tasks such as named entity recognition and semantic relationship acquisition.It is the basis of natural language processing related work.Traditional methods for solving information extraction tasks are mainly based on rules and statistics-based.Rule-based methods require a lot of linguistic knowledge and are generally not universal.While statistical-based methods can get rid of reliance on linguistic knowledge,they require a large number of manual annotation features.In recent years,deep learning have been widely used in various fields of natural language processing.Deep learning does not require strong linguistic knowledge and a large number of manual annotation features to self-learn sample features,and has surpassed many traditional methods in the field of information extraction.After in-depth study and research on information extraction based on deep learning,we find that there are still two shortcomings:In the named entity recognition model,there is little research to improve the performance of the named entity recognition model by introducing syntax information.(1)In the process of named entity recognition,only the context information of each word or word is generally considered,and the degree of attention to syntactic information is not high.There are few studies to improve the performance of named entity recognition model by studying syntactic information.(2)In the related research on the acquisition of semantic relations,the research on the whole sentence is generally focused on,and the local features of the sentence are rarely paid attention to,resulting in the unsatisfactory effect of the relationship acquisition.In view of the above two shortcomings,this paper designs a named entity recognition model based on syntactic analysis and deep learning,and a semantic relationship acquisition model based on multi-layered attention mechanism and bidirectional LSTM network.The main work is as follows:(1)in this article,we crawls Baidu Encyclopedia,interactive encyclopedia and other Chinese encyclopedias to build an encyclopedia corpus;(2)Introduce a linear coding component syntax tree in the named entity recognition model,add syntactic analysis to the input layer of the Bi-GRU,and improve the accuracy of the named entity recognition.Through comparison experiments,the results prove the validity of the syntax analysis introduced in this paper.The accuracy of identification was 98.39% and the recall is 94.29%.(3)The relationship extraction model is constructed by combining the physical location feature with the Bi-LSTM network structure of multi-layer attention mechanism: the word vector representation method combining position embedding and word embedding is adopted to increase the semantic relevance,and the LSTM model is used to avoid the traditional deep learning.The long-distance dependence problem of the method,while using a multi-level attention mechanism,makes full use of the local features and global features of the sentence.Through comparison experiments,the results prove the effectiveness of the multi-level attention mechanism introduced in this paper.The accuracy of the identification obtained by the experiment is 83.90%,and the recall is 86.44%.
Keywords/Search Tags:Recurrent Neural Networks, Fusion Feature, Multi-level Attention, Syntactic Analysis, Conditional Random Fields
PDF Full Text Request
Related items