Font Size: a A A

Coreference Resolution Model Incorporating Chinese Word Segmentation Information

Posted on:2022-12-09Degree:MasterType:Thesis
Country:ChinaCandidate:Q W XiaoFull Text:PDF
GTID:2518306773993289Subject:FINANCE
Abstract/Summary:PDF Full Text Request
In the era of big data,all walks of life will generate massive data fragments every day.People can get the information they want through the Internet,but the human brain has limited ability to store and organize information,and the knowledge graph technology can turn a large amount of complicated information into structured information,thus simplifying the process of human information retrieval.In the process of knowledge graph construction,the relatively vague phrase in the text can be replaced by the coreference resolution technology,which helps the computer to better understand the text and improving the efficiency of its information extraction from the text.The goal of coreference resolution is to find phrases in the text that refer to the same object in the real world.With the rapid development of deep learning,the coreference resolution model based on deep learning has become the mainstream direction of research.However,deep neural network often requires longer training time and more computing resources,and there is still room for optimization,we explore and optimize the coreference resolution model based on deep learning based on the characteristics of knowledge graph.We firstly conducte experiments on Chinese dataset of Ontonotes Realease 5.0based on the End-to-end coreference resolution model framework,then propose optimizable directions from the perspective of improving model performance and model accuracy according to the model framework and experimental results.By combining the knowledge of Natural Language Processing and the principle of algorithm,we find that the original model can deepen the understanding of Chinese semantics by adding Chinese-specific word segmentation information when processing data,so as to prescreen some phrases that do not conform to grammatical semantics to save computing resources.When choosing a word segmentation model,we compare five word segmentation schemes through experiments,and selecte a reasonable word segmentation model which is Bert+Softmax algorithm based on the metrics of the word segmentation algorithm and the tolerance of the coreference resolution model for different word segmentation errors,and then apply it to coreference resolution task.The coreference resolution model after adding word segmentation information not only saves nearly 1/4of the training time,but also improves average F1 by 1-2%.We then explore the applicability of different pre-trained language models to the coreference resolution task,the addition of the Ro BERTa pre-training model increases the model effect by another1.5%.Finally,we try to apply the optimal training model obtained in the experiment to the financial news text obtained by the crawler,then count and analyze the resolution effect of the references which concerned in the financial field.The experiment result shows that our model has a certain coreference resolution ability in crawler text,and has a strong ability to identify the coreference relationship between noun phrases.
Keywords/Search Tags:Coreference resolution, End-to-end, Deep learning, Chinses word segment, Pre-trained models
PDF Full Text Request
Related items