Font Size: a A A

Chinese Attribute Extraction Based On Siamese Neural Network

Posted on:2022-12-10Degree:MasterType:Thesis
Country:ChinaCandidate:K JiangFull Text:PDF
GTID:2518306743951239Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
At the Internet era,a lot of text data is generated every day.It is already difficult for individuals to find useful information from a large amount of data,and must resort to pre-processing by machines.However,the unstructured natural language text data that is easy for humans to understand is not easy for machine calculation and processing.Information extraction technology is the study of converting unstructured text data into structured data that is convenient for machine calculation and processing.This article focuses on extracting the attributes of a given entity from a given text.The main work of this paper is as follows:1.This paper proposes an attribute extract model based on the question and answer mode,which proposes the attribute relationship and entity questions corresponding to the text.The answer to the question is extracted from the given text through the question and answer system,and the extracted answer is the attribute value corresponding to the entity.The attribute value extraction based on the question and answer mode makes the model more inclined to the understanding of the text,and the features extracted from the text and the entity and attribute relationship information have stronger generalization.This paper uses the pointer network to decode the attribute value from the extracted features,which is different from the traditional sequence labeling model to extract the attribute value.The pre-trained model BERT is used in the model to encode text information.The BERT model is a language model trained on a large amount of unlabeled data,which can effectively extract lexical information in the text and provide good text features for the decoding of attribute values.2.This paper proposes an attribute relationship discriminant model based on twin BERT.The BERT model can effectively extract the lexical information of the text when extracting text features,but it is still insufficient for the extraction of syntactic informa-tion.Aiming at the problem of insufficient syntactic information extracted by the BERT model,this paper applies the twin network structure to the BERT model,and proposes the twin BERT model to improve the model's extraction of syntactic information.The training of the twin network model needs to compare the similarity between samples.This paper uses the similarity of the sample attribute relationships to represent the simi-larity of the samples.3.In this paper,the attribute extraction task is divided into two subtasks: attribute relationship discrimination and attribute value extraction.First,the twin BERT model is used to discriminate the attribute relationship existing in the text,and the corresponding question of the attribute relationship is obtained,and then the attribute value extraction model based on the question and answer mode is used.Extract the attribute value in the text.This paper verifies the effectiveness of the attribute value extraction model based on the question and answer mode and the attribute relationship discrimination model based on twin BERT on the Baidu Encyclopedia dataset and the Du IE2.0 dataset,and divides the attribute extraction task into attribute relationship discrimination and attribute value extraction.The feasibility of the two tasks.
Keywords/Search Tags:Chinese Attribute Extraction, Siamese Neural Network, BERT, Question Answering
PDF Full Text Request
Related items