Coreference Resolution Model Incorporating Chinese Word Segmentation Information

Posted on:2022-12-09

Degree:Master

Type:Thesis

Country:China

Candidate:Q W Xiao

Full Text:PDF

GTID:2518306773993289

Subject:FINANCE

Abstract/Summary:

PDF Full Text Request

In the era of big data,all walks of life will generate massive data fragments every day.People can get the information they want through the Internet,but the human brain has limited ability to store and organize information,and the knowledge graph technology can turn a large amount of complicated information into structured information,thus simplifying the process of human information retrieval.In the process of knowledge graph construction,the relatively vague phrase in the text can be replaced by the coreference resolution technology,which helps the computer to better understand the text and improving the efficiency of its information extraction from the text.The goal of coreference resolution is to find phrases in the text that refer to the same object in the real world.With the rapid development of deep learning,the coreference resolution model based on deep learning has become the mainstream direction of research.However,deep neural network often requires longer training time and more computing resources,and there is still room for optimization,we explore and optimize the coreference resolution model based on deep learning based on the characteristics of knowledge graph.We firstly conducte experiments on Chinese dataset of Ontonotes Realease 5.0based on the End-to-end coreference resolution model framework,then propose optimizable directions from the perspective of improving model performance and model accuracy according to the model framework and experimental results.By combining the knowledge of Natural Language Processing and the principle of algorithm,we find that the original model can deepen the understanding of Chinese semantics by adding Chinese-specific word segmentation information when processing data,so as to prescreen some phrases that do not conform to grammatical semantics to save computing resources.When choosing a word segmentation model,we compare five word segmentation schemes through experiments,and selecte a reasonable word segmentation model which is Bert+Softmax algorithm based on the metrics of the word segmentation algorithm and the tolerance of the coreference resolution model for different word segmentation errors,and then apply it to coreference resolution task.The coreference resolution model after adding word segmentation information not only saves nearly 1/4of the training time,but also improves average F1 by 1-2%.We then explore the applicability of different pre-trained language models to the coreference resolution task,the addition of the Ro BERTa pre-training model increases the model effect by another1.5%.Finally,we try to apply the optimal training model obtained in the experiment to the financial news text obtained by the crawler,then count and analyze the resolution effect of the references which concerned in the financial field.The experiment result shows that our model has a certain coreference resolution ability in crawler text,and has a strong ability to identify the coreference relationship between noun phrases.

Keywords/Search Tags:

Coreference resolution, End-to-end, Deep learning, Chinses word segment, Pre-trained models

PDF Full Text Request

Related items

1	Research On Key Technologies Of Chinese Coreference Resolution Based On Deep Learning
2	Research On Chinese Coreference Resolution Based On Pre-trained Language Model
3	Research Of Key Issues In Event Coreference Resolution
4	Research On Related Technology Of End-to-end Neural Coreference Resolution
5	Research On Coreference Resolution Oriented To Knowledge Graph
6	Emotional Tendency Analysis Of Uyghur Text Based On Deep Learning
7	Reserch On Event Coreference Resolution Based On Deep Learning
8	Research On The Key Issues Of Event Coreference Resolution
9	Improvement And Compression Of Pre-Trained Language Models For User-Generated Texts
10	Coreference Resolution Research In Uyghur Pronouns Based On Deep Learning