Font Size: a A A

Research On Text Understanding Technology For Text

Posted on:2022-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:M Y ShiFull Text:PDF
GTID:2518306524484154Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The core of text understanding lies in the detection and extraction of textual knowledge elements and their relations.How to define the knowledge elements and their relations that are easy to understand and express is the basis of improving the effect of text understanding.This thesis proposes a presentation frame for knowledge understanding.Based on the frame,the related technology of knowledge elements extraction and knowledge elements relationship extraction are studied.At the same time,a platform for displaying multi-granularity textual knowledge are realized.The main research contents are as follows:(1)This thesis designs a text presentation frame for knowledge understanding,which provides standards for subsequent extraction of knowledge elements and knowledge elements relationship.It classifies and defines the textual knowledge elements,standardizes the corresponding attributes and related constraints of the knowledge elements,and defines the relations among the knowledge elements at the same level.(2)The key technology of extracting grammatical elements are studied.Based on the traditional model,the Bert word vector is added,which solves the polysemy problem of words,improves the accuracy of named entity recognition.The effectiveness is verified by experiments.At the same time,a synonym sentence frequency calculation method based on Sim Hash and Sentence2 Vector is proposed,which provides guidance for the subsequent extraction of sentence-level semantic knowledge elements.(3)The key technology of extracting semantic elements are studied.This thesis based on the granularity of knowledge,proposes a keywords extraction algorithm named TFIDF-APD,a key phrases extraction method that combines keywords information and topic information,and a key sentences extraction method based on multi-feature calculation.The TFIDF-APD algorithm is based on the traditional TFIDF algorithm,innovatively introduces two factors named word distribution and word position to improve the accuracy of keywords extraction experiment.About key phrases extraction,it combines the keywords information and the text topic information,which improves the accuracy of the key phrases extraction experiment.For key sentences extraction,this thesis combines word-level constraint features and sentence-level constraint features,flexibly assigns feature weights,and realizes dynamic key sentences extraction,which is suitable for key sentences extraction tasks from different perspectives.The effectiveness of the proposed methods are verified through experiments.(4)The key technology of extracting relation of knowledge elements are studied.This thesis introduces location feature to deal with the referential relationship between word-level knowledge elements and their pronouns.At the same time,this thesis applies the Bert model to extraction of relations between sentences,transforming the task of extracting relations between sentences into classification tasks.It realizes the extraction of the five types of relations,including causal relations,transition relations,temporal relations,and inclusion relations.The effectiveness is verified through experiments.(5)A text understanding platform for single text is designed and implemented,and the above key algorithms are integrated and verified.
Keywords/Search Tags:text knowledge graph, presentation frame, grammatical elements, semantic elements, element relationship
PDF Full Text Request
Related items