Font Size: a A A

Research And Implementation Of Named Entity Recognition Based On Ancient Literature

Posted on:2019-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:T XieFull Text:PDF
GTID:2348330542998759Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet,named entity recognition task is attracting more and more attention.Named entity recognition has become an important part of Natural Language Processing technology,such as public opinion analysis,information retrieval,automatic question answering,Machine Translation and so on.How to identify named entities automatically,accurately and quickly from massive Internet text information has gradually become a hot topic in academia and industry.In recent years,research on named entity recognition has gradually emerged.A series of algorithms are constantly emerging,and the application scenarios of named entities continue to expand.It covers all aspects of people's life.But the existing algorithms are mainly concentrated in the modern Chinese corpus,the study of ancient Chinese corpus is poorly understood.With the digitization of large-scale ancient Chinese corpus,how to extract valuable entity information from these corpus will bring great significance to the field of Natural Language Processing and Computational Sociology.The main research contents in the field of named entity recognition of ancient literature in this paper are as follows:The acquisition and preprocessing of the ancient literature corpus.The ancient Chinese corpus that has already been Chinese word segmentation and named entity recognition is not available to researchers.Researchers need to manually process and annotate corpus,so the first step is to get experimental corpus and preprocess the acquired corpus.This paper mainly studies the corpus of Song Poetry and History of the Song Dynasty.The research of new word detection algorithm.Because the accuracy of Chinese segmentation results will bring great impact to the named entity recognition.Moreover,due to the particularity of ancient Chinese corpus,many words are not included in modern Chinese dictionaries.In order to improve the accuracy of word segmentation in ancient Chinese.In this paper,a new word detection algorithm is proposed based on Apriori algorithm and LSTM neural network.The research of named entity recognition in ancient Chinese.The traditional Chinese research on named entity recognition mainly concentrated in the modern Chinese corpus,the research of named entity recognition in ancient corpus is still in the initial stage.At present,the mainstream method of named entity recognition in modern text is based on a series of machine learning methods,such as maximum entropy model,conditional random field model,and neural network model and so on.In this paper,the LSTM neural network model and conditional random field model are used to explore the performance of the named entity recognition in the ancient text.The research of named entity linking algorithm.The entity disambiguation is designed to further improve the quality of the named entity,and link the ambiguous entity to the correct entity based on the local knowledge base.In this paper,we propose a novel graph-based and weighted word2vec Chinese collective entity linking algorithm.The model improves the traditional PageRank algorithm,and combines the weighted word2vec textual similarity and incremental evidence mining.Finally,we extensively evaluate the performance of our algorithm on some open domain corpus.Experimental results demonstrate that our method can be effective at improving the precision and recall of the entity linking results.
Keywords/Search Tags:Ancient Chinese literature, New word detection, Named entity recognition, Named entity disambiguation, Deep learning
PDF Full Text Request
Related items