Font Size: a A A

Relation Trigger Words Extraction And Optimization Based On Syntactic Dependency And Word Activation Force

Posted on:2021-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2428330626958907Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the continuous development of the Internet and the arrival of the era of big data,a large amount of data is generated every day,and people need to extract valuable data from these data.As one of the sub-tasks of information extraction,relation extraction is widely used in search engines and other fields by extracting entity pairs and their mutual relationships from data,which is one of the important technologies in the era of big data.Relation triggers,as words or phrases that represent entity relationships in text data,play an important role in relation extraction.Making full use of their relation features can help improve the accuracy of relation extraction.The traditional methods of extracting relation trigger words can only extract the trigger words that trigger a certain relation type,and the obtained trigger words may not trigger the relationship of a given entity pair.Some relation trigger word extraction methods can extract relation trigger words for a given entity pair,but only for relation trigger words of the single word type.For phrase-type relation triggers,the relation trigger words obtained by these methods are not complete.This paper uses syntactic dependency analysis and word activation force model to divide the complete relation trigger word extraction into two stages: core trigger word extraction and non-core trigger words extraction.The core trigger word and non-core trigger words obtained from the two stages are combined as the final relation trigger words.The method proposed in this paper has achieved good results of relation trigger words extraction.In the core trigger word extraction stage,the dependency distance is used to evaluate the distance between the word and the entity pair in the dependency graph by using syntactic dependency analysis,and the sequence distance is calculated to determine the relative position of the word and the entity pair in the original word sequence.The evaluation score is calculated by combining the dependency distance and sequence distance,and the core trigger word is obtained based on the evaluation score and the part-of-speech of the word.In the non-core trigger word extraction stage,the word activation force model is improved to improve its accuracy and interpretability,and non-core trigger words are extracted based on the part-of-speech of the word and the word activation force between the word and the core trigger word.The trigger words obtained in two stages are together as the ultimate relation trigger words.The method proposed in this paper is optimized by setting up a Stanford CoreNLP cluster to improve its text data processing efficiency and setting the word activation force matrix of the dataset to reduce the time complexity of non-core trigger words extraction.MPI-related technologies are used to cluster the method proposed in this paper to adapt to the big data environment and improve the efficiency of relation trigger words extraction under big data.Compared with traditional trigger word extraction methods,the proposed method considers the relationships between trigger words and entity pairs and the activation relationships between words more fully,and the resulting relation trigger words are more accurate and complete.The experiments performed on the labeled SemEval 2010 Task 8 dataset and the NYT dataset have obtained good experimental results.From the word perspective,an F1 value of 0.87 is obtained and the accuracy rate is 72% from the perspective of the relation instance.The experiments prove that the optimization improves the performance of the algorithm and its adaptability to the big data environment.By adding the features of relation trigger words to the relation extraction model,the guiding effect of relation trigger words on relation extraction was verified.The relation trigger words obtained by the algorithm are combined with the given entity pairs to form triplets,which are stored in the Neo4 j database and visualized by using echarts technology.The experimental results intuitively reflect the accuracy of the relation trigger words and application of relation trigger words in open relation extraction.In a word,the method proposed in this paper has certain learning and reference value.
Keywords/Search Tags:Relation trigger words, Relation extraction, Syntactic dependency, Word activation force, MPI
PDF Full Text Request
Related items