| With the explosive growth of data in the current area,we could obtain relevant information through various applications.However,most of the information in real world is unstructured text,which is difficult to extract valuable information efficiently.Therefore,entity relation extraction emerges as the times require.Entity relation extraction automatically recognizes the relationship between entities in a sentence,and constructing a structured knowledge base,which provides support for applications such as question answering systems,information retrieval,and knowledge graphs.At present,the research of Tibetan entity relation extraction is still in the exploratory stage,and there are many difficulties and challenges.Firstly,Tibetan entity relation extraction corpus is rare,and it is difficult to label.Secondly,Tibetan corpus also has the problem of semantic ambiguity.The traditional word vector representation cannot distinguish the meanings of words in different contexts.Finally,the entity relation extraction in Tibetan based on deep learning requires large-scale training corpus,and the accuracy needs to be improved.Therefore,this paper studies Tibetan entity relation extraction based on distant supervision and attention mechanism by constructing a certain scale of Tibetan knowledge base and corpus.The main work is as follows:1.Tibetan entity relation extraction based on distant supervisionTraditional supervised methods require large-scale manually labeled datasets,while training corpus is not enough in Tibetan.To settle this problem,we construct Tibetan relation extraction dataset using distant supervision method,which aligns entities in texts to Knowledge Base.Then,piecewise convolutional neural networks(PCNN)is used to automatically learn relevant features based on entities,and the multi-instance learning(MIL)method is introduced to improve the accuracy.Experimental results show that the model is about 7.2%higher than the RNN model on F1 value.2.Tibetan entity relation extraction integrating dynamic semantic informationWord vector representation is the basis and key to natural language processing,and its quality directly affects the performance of the entire system.What’s more,Tibetan also has the problem of polysemy.In order to learn rich semantic knowledge,this paper generates deep contextual representation of words based on a bi-directional long and short-term memory neural network.Combined with word vectors,position vectors and part-of-speech vectors,the model can learn the complex features of words and their changes in different language environments.Experiments show that this method effectively assigns different vector representations to words according different context.Finally,the F1 value reaches 35.2%,which is 6.3%higher than the word2vec model.3.Multi-level integrative attention mechanisms modelThe distant supervision method suffers from wrong label problems,which affects the performance of relation extraction.To solve this problem,this paper proposes an improved relation extraction model that integrates attention mechanisms.We add self-attention to extract internal features of words in word level.The selective attention mechanism could assign weights of each instance,so as to make full use of informative sentences and reduce the weight of noisy instances.Then,a joint score function is introduced to correct wrong labels,and combined with SVM to extract relations.Experimental results show that the F1 value reaches 62.6%,which is 29.9%higher than the baseline model. |