Font Size: a A A

A NLP-based Novel Character Attribute Extraction System

Posted on:2021-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:L P GuoFull Text:PDF
GTID:2428330632462617Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The results of character attribute extraction oriented to novels can be used for the construction of guided dialogue systems and the construction of knowledge maps.The attribute extraction task currently uses a large number of rule-based methods,and the similar relationship extraction task currently has two types of solutions:the pipeline method and the joint model method using neural networks.The Pipeline method generally includes two models of named entity recognition and relationship extraction.This article implements a character attribute extraction system for novel scenes,in which character attributes include personality,age,appearance,and style.The main modules of character attribute extraction system include named entity recognition module,attribute extraction module and visualization module.The main research contents of this paper include:(1)This paper establishes named entity recognition datasets and attribute extraction datasets for novel scenes,in order to conduct experiments related to named entity recognition and attribute extraction for novel scenes.Aiming at the problem of named entity recognition,this article combines the mainstream model of named entity recognition BiLSTM-CRF and transfer learning model BERT to build a BB-CRF model.For the novel application scenario,the rules of entity backtracking and long entities covering short entities are further improved The effect of named entity recognition.Experiments on the novel entity recognition dataset show that BB-CRF works best,and the F1 value for attribute entities can reach 82.54%.On the selection of the pre-trained model BERT,this article compares the Chinese-language BERT-base published by Google,the full-word mask Chinese BERT-base jointly released by Harbin Institute of Technology and HKUST Xunfei,and this article uses Harbin Institute of Technology's Chinese BERT-base to initialize over 4,500 books.Doka-trained Chinese BERT-base on type novels.(2)On the issue of attribute extraction,this paper draws on the ideas of the ESIM model to treat the problem of attribute extraction as a two-class problem of determining the correspondence between the name and the attribute entity pair,and builds a BERT-based LI-Transformer model.The experimental results show that the LI-Transformer model has the best effect,reaching 90.35%.(3)Finally,the results of character attribute extraction in this paper were visualized through the Neo4j database.
Keywords/Search Tags:Knowledge Graph, Attribute Extraction, Named Entity Recognition, Recurrent Neural Network, BERT, Transformer
PDF Full Text Request
Related items