Font Size: a A A

Classify Financial Documents Via Graph Representation Learning Based On Momentum Contrast

Posted on:2022-11-26Degree:MasterType:Thesis
Country:ChinaCandidate:X N LuoFull Text:PDF
GTID:2518306752454434Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
The text classification task is of great significance in the research of the financial industry,serving in the direction of risk control,stock forecasting,evidence disclosure,and process regulation.The current mature general text classification methods only consider semantic information,and perform poorly when applied to the fine-grained classification system in the financial field.To solve this problem,this paper constructs a financial text-named entity graph network structure based on the implicit relationship between the text and the named entity,and use graph convolutional network to fuse semantic information and domain knowledge information.The graph representation learning methods that are currently used in the industry with outstanding effects are usually based on a supervised method and rely on manually labeled data sets.However,exploration in unsupervised methods is rare,and the performances of traditional unsupervised graph representation learning models is hard to guarantee.In recent years,contrastive learning has broken through the performance bottleneck of traditional unsupervised graph representation learning methods,but the existing methods ignore the contribution of the number of negative samples,and when faced with large graph networks,the global representation is obtained by randomly sampled nodes,which damages the mutual information between the local representation and the global representation,and existing unsupervised graph representation learning methods has not yet reached the requirements of industrial applications.Therefore,this paper proposes a self-supervised graph representation learning model based on momentum contrast(G-MoCo):·This paper transforms the text classification problem into a graph representation learning task,combining domain information to represent text.To solve the problem of poor fine-grained classification performances of mature text classification methods,this paper constructs a text-named entity graph network to transform the problem into learning graph node representations,so that semantic information and domain knowledge information are simultaneously integrated into text node representations,and then input the text representation into the downstream classifier to improve the text classification performances in the financial field by improving the text representation.This paper adopts the subgraph sampling method that considers the mutual influence of nodes.To solve the problem of mutual information damages between the local and global representations of existing contrastive learning methods,this paper samples subgraphs on the full graph,use graph convolutional network to encode the subgraphs,and uses the nodes in the subgraph as local representations,uses all nodes in the subgraph to compute global representation,and then compare local and global representations to train the encoder.In the process of sampling subgraphs,this paper makes the nodes with higher mutual influences more likely to be sampled on a subgraph.On the one hand,use subgraphs to learn representations can reduce computational memory requirements.On the other hand,the local and global representations calculated on the subgraphs are more related and will not lose mutual information.This method raised the Micro-F1 score by 6.9%.·This paper proposes a graph representation learning model based on dynamic dictionary and momentum contrast.To solve the problem of ignoring the number of negative samples in existing contrastive learning methods,this paper proposes to use a dynamic dictionary to increase the number of negative samples,and design separate encoders for positive and negative samples respectively,and only use positive samples in back-propagation to update parameters.The encoder of negative samples update parameters in a momentum way which would reduce the computation complexity and maintain the consistency of the representations of negative samples in the dynamic dictionary.This method raised the Micro-F1 score by 7.4%.·This paper proposes to use the normalization term to remove the bias introduced by the sampling subgraph.Due to the nonidentical probability of nodes to be sampled,the nodes that are frequently sampled will be paid more attention during the learning process.This paper further optimizes the model in the aggregation function and loss function with specialized normalization terms respectively.This method raised the Micro-F1 score by 9.6%.
Keywords/Search Tags:text classification, graph representation learning, contrastive learning, self-supervised learning
PDF Full Text Request
Related items