Font Size: a A A

Design And Implementation Of The Core Information Extraction System Of Semi-structured Financial Contract

Posted on:2021-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:S X LiuFull Text:PDF
GTID:2518306557990699Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of financial informatization,how to quickly and accurately screen out valuable information from various financial texts and store information structurally according to the needs of enterprises has become increasingly important.Semi-structured financial contract is a common financial data,it is written in a specific text structure but has the characteristics of long length and information redundancy.This thesis analyzes the structure and data characteristics of this kind of contract,and adopts the idea of first classifying the contract text and then extracting the core information.Firstly,the core information extraction algorithm is designed.1)Preprocessed the data and created a custom dictionary for the financial professional field.The dictionary optimizes and trains the word vector and keyword dictionary which are suitable for the problems and data rules in the financial field,and realizes the better filtering and constraint of the financial data.2)TextCNN model was used to classify the pre-information of contract text,and key information was extracted through different convolution kernel.Compared with traditional text classification methods,TextCNN model can better capture local correlation and achieve better text classification effect.3)According to the characteristics of the text structure of financial contracts,the correlation between text contents,strong regularity of data,etc.,simple information is extracted by using rule-based information extraction method,and complex item information is extracted by combining HMM model.Compared with the traditional information extraction method,this scheme can fully consider the influence of context content and text structure in the process of information extraction,so as to improve the accuracy of extraction task.Secondly,based on the above algorithm,a set of core information extraction system of financial loan contract is implemented by using Java programming language and SSM framework.The system has a simple interface.Users can extract and manage the core information of semi-structured financial contracts in the system,and export the data according to the format of requirements.Finally,the function and performance of text classification,information extraction and so on are tested.From the statistical results of precision and recall rate,the design goal is achieved.
Keywords/Search Tags:Semi-structured text, TextCNN, Information extraction, HMM, Text classification
PDF Full Text Request
Related items