Chinese document-level financial event extraction is an important research field in event extraction,aiming to extract the document-level financial text’s corresponding financial event information.The current research problem of event extraction in Chinese document-level finance is how to extract cross-sentence events under a document-level text.It mainly studies how to build an end-to-end model for event extraction,and most of the research is at the model level.However,these works ignore external knowledge such as lexical knowledge,domain knowledge,and label levels knowledge.This external knowledge is of great help to the performance improvement of Chinese document-level financial event extraction.This thesis proposes the following two works to solve these problems:First,a framework of document-level financial event extraction with lexical knowledge is proposed to solve some shortcomings of the existing entity extraction submodule of document-level financial event extraction.Entity recognition plays an essential role in event extraction,and the result of its extraction has a vital influence on the final result of event extraction.Most existing event extraction methods merely treat entity recognition subtask as a character-wise sequence tagging task,which uses only character level features.One drawback of the purely character-wise entity recognition is that explicit information of word and word sequence is not fully exploited,which has been proved to be valid.From this observation,this work introduces a simple yet effective model to incorporate lexical knowledge,which includes boundaries and semantic information,into the vector representations of characters.This method can improve the performance of entity recognition subtask significantly,which can also enhance event extraction performance.The proposed method does not increase the model’s complexity and thus does not reduce the execution efficiency.Simultaneously,to further improve the feature representation,this work introduces external domain knowledge into the text feature vector.In this thesis,relevant experiments are conducted on the dataset composed of many Chinese financial announcements,and the experimental results show that the proposed method is effective.Meanwhile,relevant experiments are designed to verify the method’s implementation efficiency,and the experimental results demonstrate the efficiency of the proposed method.Second,a framework of document-level financial event extraction based on the correlation of event argument is proposed to solve some shortcomings of the existing event argument classification module of document-level financial event extraction.Event argument classification is the final step of event extraction;its primary function is to associate the event argument entities extracted by the entity extraction subtask with the related event and play an essential role in the event extraction task.The existing event argument classification module of document-level financial event extraction classifies each argument individually while ignoring the hidden correlation between different arguments.Based on the above observations,this work introduces relevant modules to mine the label hierarchy that contains rich and useful external information to extract document-level financial events.This work mainly unearths the hierarchical structure information in the label and the information in the relationship level.It utilizes the label’s hierarchical information by constructing a hierarchical training module and a hierarchical decoding module.Simultaneously,drawing on the relevant ideas of multi-task learning and based on the original event extraction framework,the event extraction is disassembled into entity recognition,relationship extraction,and event extraction.By sharing the feature extraction module and constructing joint training subtasks to achieve event extraction,event arguments’ relevance at the relationship level is introduced into the Chinese document-level financial field event extraction task.In this work,relevant experiments are conducted on the dataset composed of many Chinese financial announcements,and the experimental results show that the proposed method is effective. |