Font Size: a A A

Research On Financial Knowledge Graph System Based On Deep Learning

Posted on:2020-12-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y SongFull Text:PDF
GTID:2428330623463783Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Financial knowledge graph is a domain knowledge graph that displays and stores financial entities and their relationships in the structure of graph.The key of constructing a financial knowledge graph is extracting entities and relationships from textual data such as announcements and research reports.The relationship classification model with supervised learning can effectively extract semantic features,but requires a large amount of training data.Crowdsourcing and distant supervision are common methods for annotation,but distant supervision will produce noise data.How to improve distant supervision to reduce noise data is one problem to be solved in this paper.In addition,the corpus of financial fields such as announcements and research reports are mostly long sentences.The processing of these corpora has long distance dependence problem and long sentence coding semantic loss problem.How to choose the appropriate algorithm that can solve these two problems for constructing the relationship classification model is another problem we need to solve.To achieve the above goals,we propose and implement an improved distant supervision method,called CSD,based on context similarity,a relationship classification model DRCM based on bidirectional LSTM and attention mechanism and a financial knowledge graph prototype system,called FKGS,based on CSD and DRCM.FKGS includes modules such as corpus annotation,relationship classification,and entity relationship storage.The experiments show that the system is feasible and effective.The main points and innovations of this paper are as follows:1)CSD and corpus annotation method.Distant supervision will produce noise data.Therefore,we propose and implement an improved distant supervision method,called CSD,based on context similarity.Firstly,the initial tagging corpus is obtained by combining distant supervision and multi-instance learning method.Then,the initial tagging corpus is denoised according to the context similarity comparison.Finally,the high confidence corpus is selected and labeled.Experiments showed that the accuracy of the classifier trained using the corpus labeled by CSD is 6% higher than general distant supervision method.2)DRCM and relationship classification.Long distance dependence and loss of coding information of long sentences are common problems in relation classification.We use bidirectional LSTM to encode sentences to solve long distance dependence,use attention mechanism to reduce the influence of noise data and solve loss of coding information,finally propose and implement DRCM.Experiments showed the F1 value of the model on the SKE dataset is 1.7% higher than models using the LSTM,and F1 value is 3% higher than the model without attention mechanism.3)Prototype implementation of FKGS based on CSD and DRCM.Based on the work of 1)and 2),we design and implement FKGS prototype system.FKGS includes modules such as named entity recognition based on AipNlp,corpus annotation based on CSD,relation classification based on DRCM,and entity relationship storage based on Neo4 j.At present,59,738 entities and 71,056 relationships were obtained from nearly 200 G announcements and research reports of the past three years.
Keywords/Search Tags:knowledge graph, distant supervision, LSTM, relation classification
PDF Full Text Request
Related items