Font Size: a A A

Research On Chinese Named Entity Recognition For Legal Documents

Posted on:2019-09-25Degree:MasterType:Thesis
Country:ChinaCandidate:L M WangFull Text:PDF
GTID:2428330545451224Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the explosive growth of the volume of textual data and the establishment and popularization of large-scale knowledge base,the research of named entity recognition becomes a basic research issue in the field of natural language processing.Named entity recognition aims to identify specific entities in the text.At present,the field of Chinese named entity recognition mainly uses statistical-based machine learning methods.However,traditional named entity recognition methods based on supervised learning require large-scale annotation of corpus.In the field of legal documents with scarce corpora,the traditional method does not achieve ideal results.Therefore,this paper manually annotates some contract and marriage legal documents according to the annotation guideline in Onto Notes Release 5.0.Based on this,we carried out the following tasks:First of all,for the legal documents,this paper proposes a method of person name entity recognition based on natural-annotated data.This method extracts the clauses containing the names of people as natural annotation samples by heuristic rules such as regular expressions,and expands the corpus.And we propose a joint learning approach,namely Aux-LSTM,to use a large scale of natural-annotated data to help human-annotated data for person name recognition.Specifically,our approach first develops an auxiliary Long Short-Term Memory(LSTM)representation by training the natural-annotated data and then leverages the auxiliary LSTM representation to boost the performance of classifier trained on the human-annotated data.Empirical studies show that the method has obvious advantages over the simple fusion of the two label samples for entity recognition.Secondly,for the legal documents,this paper proposes a named entity recognition method based on integer linear programming.The method is characterized by the use of integer linear programming methods to identify entities at the textual level.Specifically,firstly,we apply a state-of-the-art approach,i.e.,long short term memory(LSTM),to perform word classification;secondly,this article defines a global objective function with the obtained word classification results and achieve global optimization via Integer Linear Programming(ILP).In the ILP-based approach,we propose four kinds of constraints,i.e.,label transition,entity length,label consistency,and domain-specific regulation constraints,to incorporate various entity recognition knowledge in document level.Empirical studies demonstrate the effectiveness of the proposed approach to domain-specific document-level NER.Finally,for the legal documents,this paper proposes a named entity recognition method based on multi-task representation learning.The method is characterized by using text information to divide the named entity recognition task into multiple classification subtasks.Specifically,firstly,this paper regards the named entity recognition task as the combination of the main task and the auxiliary task,wherein the main task is the real label recognition of the current word,the auxiliary task is the real tag identification of the previous word,the latter word,and the chapter tag of the text information of the current word.Then,the auxiliary representations obtained by sharing the different learning tasks are respectively added to the main task,thereby improving the entity recognition effect of the main task.Experimental results show that the named entity recognition method based on multi-task representation learning obtains better entity recognition results at the document level.
Keywords/Search Tags:Legal Documents, Named Entity Recognition, Natural-annotating, Integer Linear Programming, Multi-task Representation
PDF Full Text Request
Related items