Font Size: a A A

Design Of Relation Extraction System Across Short Sentences In Mathematical Natural Language Understanding

Posted on:2022-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:J C TangFull Text:PDF
GTID:2518306764980259Subject:Journalism and Media
Abstract/Summary:PDF Full Text Request
As one of the main research directions of natural language understanding,the main task of relation extraction is to extract structured triples from unstructured texts.These structured triples are represented in a visual and modeled manner.Semantics of unstructured text.The current research on relation extraction mainly focuses on the study of plain text,and the extraction goal also focuses on extracting a triple from a single sentence.Thanks to the rapid development of deep learning,the research on this part of the content has become more and more mature.Therefore,some scholars have begun to turn their attention to the understanding of mathematical language,and study how to extract key information from unstructured mathematical text and use it for logical reasoning.On the basis of previous work,this thesis mainly studies how to perform relation extraction in elementary mathematics texts.The main research contents of this thesis are:1.Aiming at the characteristics of relational triples in mathematical texts,an algorithm that can extract multiple triples from a short sentence is designed,and a mathematical text relation extraction system is developed based on the algorithm.The core idea of the algorithm is to use BERT to convert mathematical text into sentence vectors,and then use the idea of text matching to find the short sentence with the highest similarity from the known data,and use the relationship of the short sentence as the relationship of the current short sentence.2.In order to collect the data set,this thesis builds a relationship labeling platform.The relationship that is not extracted in the relationship extraction stage can be labelled through the labeling platform.When the relationship labeling is completed,the short sentence and the corresponding relationship triple are It becomes a piece of data in the dataset.With the deepening of the labeling process,the dataset becomes larger and larger,and the integrity of relation extraction becomes higher and higher.3.The mathematical texts used in this thesis are high school mathematics topics.The descriptions of such topics are usually very long,so that the long-distance dependency problem in natural language processing will occur.In order to avoid this problem,when designing the system,the complete topic The method is divided into several short sentences,but this brings new problems,that is,some relational triples need to rely on the previous short sentences.The relationship between short sentences cannot be extracted correctly.In order to solve this problem,on the basis of single short sentence relationship extraction,this thesis carries out special optimization processing for cross-short sentence extraction.This further improves the completeness and correctness of relation extraction.Based on the above design,this thesis designs a complete elementary mathematical relation extraction system,and tests the usability and reliability of the system to verify the feasibility of the proposed general text relation extraction and cross-sentence relation extraction algorithm.After the system test,the correct rate of system relation extraction is maintained at about 90%,and the completeness of relation triples extracted from mathematical topics is about 80%.The unit groups that have been labeled across short sentences can be correctly and completely extracted during relation extraction.
Keywords/Search Tags:relation extraction, elementary mathematics, relationship annotation, Relation extraction across short sentences
PDF Full Text Request
Related items