Font Size: a A A

Research On Event Extraction Method In Chinese Domain Based On Dependency Parsing And Deep Learning

Posted on:2021-12-10Degree:MasterType:Thesis
Country:ChinaCandidate:L QianFull Text:PDF
GTID:2518306302476154Subject:Financial Information Engineering
Abstract/Summary:PDF Full Text Request
Chinese domain event extraction is a challenging task.It aims to extract the event information that users are interested in from a large amount of unstructured text and present it in a structured form for users to analyze and use in the next step.It is the basic technology support of constructing knowledge map and realizing automatic summarization and other application scenarios.With the advent of the era of big data,the relationship between computer technology and financial field is getting closer and closer,and natural language processing is also very useful in this field.As the pillar of financial industry,banking industry faces various risks,among which operational risk is one which cannot be neglected.In the final revision of Basel agreement ?,operational risk capital measurement method is optimized,the operation risk data quality should meet higher requirements.This thesis studies the event extraction technology in the Chinese domain and extracts operational risk events from the text data of bank news in the last 10 years,providing technical support for the establishment of external bank operational risk event database.At the same time,for the extracted operational risk events,the relationship between the operational risk event type,bank institution type,event occurrence time and loss amount,times of event occurence are summarized and analyzed.In the stage of data acquisition and preprocessing,the research includes crawling the text data of bank news from two news websites,designing a filtering algorithm to get the news about bank operation risk,and using external tools to carry out preprocessing such as word segmentation and part of speech tagging.In the event extraction stage,the research includes using the event extraction method based on pattern matching,combining dependency syntactic analysis and deep learning to improve the effect of event extraction.Firstly,by using the semi-automatic method of combining human and program,the seed thesaurus of trigger words is constructed and the thesaurus of trigger words is expanded by using the synonym forest.By using news text and network information to construct domain thesaurus,a template library suitable for extracting different event types is constructed by combining the idea of inheritance and induction with the characteristics of different event types,so as to prepare for extracting subsequent event elements.Trigger word extraction and event type recognition are a core sub-task of event extraction,which are also the focus of this research.The following two methods are used to achieve good results.The first method is based on the dependency syntactic analysis.Stanford Core NLP tool is used to obtain the dependency relationship of each component in the sentence,and the trigger word-entity description pair was formed.The multi-dimensional feature vectors including word itself,part of speech and semantic information obtained by dependency syntactic analysis are constructed.SVM,Random Forest and Adaboosting classification algorithms are used for prediction,among which SVM is the best,and F1 values of trigger word extraction and event type recognition are both more than 86%.The second method is based on the deep learning method.The word vectors and position features trained by the Skip-gram model are used as the text vector-quantization expression of event sentences,and CNN and Bi LSTM networks are constructed to extract word level features and sentence level features,respectively.The F1 values in both stages are above 81%.Event element extraction is another core sub-task of event extraction.In this research,the idea of topic event extraction method based on framework is applied to the extraction of event elements such as event occurrence time and loss amount.The extraction of event occurrence time uses the technique of co-index resolution to improve the accuracy.In the event extraction based on pattern matching,considering the Chinese language in the form of agile diversity,in addition to the traditional matching method,this research considers soft matching way.It means the text mode which has high similarity with the one in the template library,also can be extracted.This method effectively improves the extraction rate of recall.The accuracy rate of the final event element extraction effect for the name of the bank institution is 88%,and the F1 values of the event occurrence time and loss amount are both above 84%.In the applied research phase,this research use the extraction of banking operational risk events in the past ten years to make quantitative analysis,from the operating risk event types,the types of banking institutions and event occurence time such three dimensions.Besides,the research make qualitative analysis about the causes of the above phenomenon and some pieces of advice are given.In this thesis,a method of extracting Chinese domain events with good effect is designed and verified on bank news text data.Advanced natural language processing technology is used to facilitate the operational risk management of commercial banks.
Keywords/Search Tags:Chinese Event Extraction, Dependent Syntactic Analysis, Deep Learning, Pattern Matching
PDF Full Text Request
Related items