With the rapid development of artificial intelligence technology,the importance of cognitive intelligence has become increasingly prominent.In particular,knowledge graph,as an important technical means of cognitive intelligence,has increasingly demonstrated its ability to break through bottlenecks in search engines and intelligent applications.However,due to problems such as low knowledge coverage,the current knowledge graph is difficult to be widely used in various fields.There is a wealth of unstructured text on the Internet,which can be used as an important source of knowledge to increase the coverage of the knowledge graph.Extracting knowledge from unstructured text has always been a difficult and hot issue in natural language processing research.The existing extraction methods may have error propagation or include artificial feature selection,which has certain limitations,and will bring a lot of extra work for the later expansion of the knowledge graph.Based on the above background,this paper focuses on the need to expand the knowledge graph,and studies the knowledge extraction method that takes unstructured text as the object,that is,extracts the knowledge of the structure form {head entity,relationship,tail entity} named triple.The traditional knowledge extraction method adopts the step-by-step extraction method,and is faced with the problems of too complicated artificial design features,error propagation and information redundancy.Recently,the popular joint extraction method has poor extraction effect in the face of the problem of overlapping triplets.In order to solve the above problems,the research work of this paper mainly includes two parts:Firstly,this paper proposes a knowledge extraction method based on multi-layer semantic structure of labeling strategy.The labeling strategy used in this method can transform knowledge extraction tasks into sequence labeling tasks.Specifically,we adopt a special triplet labeling strategy firstly,that is,each word belonging to an entity in the text should have multiple specific tags,which are composed of three parts: the position of the entity in the triplet,the relationship type of triple and the position of the word in the entity.Then,through the proposed Multi GRU model,the mapping relationship between the text sequence and the multi-layer label sequence is realized.The Multi GRU model is mainly composed of bidirectional threshold recurrent neural network(Bi GRU)and multi-layer parallel GRU network.Finally,the continuity,consistency,and nearest principles applicable to language features are proposed,which can convert label sequences into knowledge triples.Comparative experiments on two public datasets,NYT and KBP,show that this method is superior to most previous models.Secondly,an adaptive label sequence extraction model based on a pre-trained model is proposed to achieve the goal of knowledge extraction.The model includes a BERT coding structure and a bidirectional GRU decoding structure,which can automatically generate the number of sentence semantic levels based on sentence semantic features,which can realize the function of dynamically updating the number of decoded lines of a sentence when there are overlapping knowledge triples.On the NYT data set generated by remote supervision and the Web NLG data set with a large number of overlapping triples,the F1 values of the model reached 75.8% and 80.1%.The results show that,compared with the current knowledge extraction model,the method proposed in this paper has better performance. |