Font Size: a A A

Research On The Identification Method Of Predicate Core Word Based On Boundary Regressio

Posted on:2024-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y ShangFull Text:PDF
GTID:2568307130473954Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,text data has exploded.How to quickly extract the effective information in the text by obtaining the key information of the sentence has become a problem that needs to be solved.The predicate head word is the core of the sentence and has uniqueness in the sentence.Predicate heads play a key role in expressing actions,states or processes,so identifying predicate heads helps to understand the structure and meaning of sentences.However,in the identification method of the predicate head word,most of the traditional identification methods use a sequence labeling algorithm to output a labeling path with the highest probability.The output path may contain multiple predicate head words,and it is difficult to directly locate the predicate head words.To solve this problem,this paper proposes a sentence-oriented predicate head boundary regression method and a span-oriented predicate head boundary regression method.They are introduced as follows:(1)A sentence-oriented predicate head boundary regression method is proposed.This method is different from the traditional classification method to predict the category label of the candidate predicate head word,but innovatively uses the regression network to directly output the start position and end position of the predicate head word in the sentence.It effectively solves the uniqueness problem in the recognition of the predicate head word.The model inputs the text to BERT to obtain semantic representation,then uses the Bi-LSTM layer to capture the information of sentence representation,and further extracts its features through the convolutional layer to obtain the abstract representation of the entire sentence.Finally,two regression layers are used to output the start and end boundaries of the predicate head respectively.The model achieved an F1-value of 82.99% in the experimental dataset.(2)A span-oriented predicate head boundary regression method is proposed.This method takes spans as input,sorts and filters spans.Classification predicts the span of categories and predicts offsets relative to the true predicate head word.First,the text passes through the encoding layer to obtain the vector representation of the sentence.On the basis of the sentence vector representation,the candidate spans are obtained by enumerating the start position and end position of the predicate head word.Then,the negative samples in the candidate span are detected by the classifier and filtered to reduce the data imbalance problem in the positive and negative samples.Finally,an offset prediction module is used to predict the left and right boundary offsets of candidate spans relative to the predicate head.Adjust the candidate spans according to the offset,and finally remove the redundant candidate spans in the span regression layer according to the non-maximum value suppression algorithm to obtain the prediction result of the predicate center word.This method is a complement to the sentence-oriented predicate head recognition method.It can alleviate the problem of insufficient performance caused by semantic compression when generating abstract representations of long sentences.The model associates the beginning and end of the predicate head word through the span,and achieved an F1-value of 83.18% in the experimental data set,effectively predicting the predicate head word in the data set.
Keywords/Search Tags:Predicate head, Uniqueness, Boundary regression, Span regression
PDF Full Text Request
Related items