Font Size: a A A

Research On Terminology Recognition In Financial Domain

Posted on:2018-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:C LiangFull Text:PDF
GTID:2348330536460949Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Terminology recognition is one of the basic tasks in the field of information processing.It is of high application value to identify terms quickly for text-mining,information extraction,public opinion analysis and other tasks in the financial domain.So far terminology recognition task mainly use machine learning method.The feature selection of the existing machine learning model is too complex and dependent on manual participation,and the rules of post-processing are much more dependent on the specific data without being universal.This paper proposed a new algorithm for identifying financial terms.Firstly,two kinds of machine learning models are used.The first one is the traditional shallow machine learning model,CRF model,which only selects simple features.The second one is the representative neural network model-LSTM model,which avoids the gradient diffusion problem of RNN in the study of long-distance information.Meanwhile,we also try the typical variant of LSTM model-GRU model,and divide its memory cell into two parts,the left and right memory units.The F score of the improved GRU model can reach 88.13% in the same condition which is 0.68% higher than that of the basic GRU model.Secondly,this paper uses the term credibility model based on information entropy to optimize the results.To get the entropy by token's marginal probability and extract candidate term of a particular error type so as to target it;To convert the word to word vector with rich semantic information and filter the candidate terms by mutual information and similarity and get financial terms with high-quality.The proposed method can be used as general-purpose optimization method which effectively improves the recall rate and the integrity of term structure.Experiments on the corpus of financial domain indicate that precision rate,recall rate and F score of the optimized result of CRF are 95.30%,91.58% and 93.40%;F score of the optimized results of neural network can be improved about 1.3%~1.5%.In summary,this paper uses CRF model and the neural network model to identify the financial terms.These two methods are supervised learning.The result of neural network model is slightly lower than that of CRF model due to the limited corpus.However,the method of neural network model has shown its great potential in the adaptability to domain and the performance of recognition without any manual invention.Finally,a term credibility model based on information entropy is proposed as a stable and general optimization method.
Keywords/Search Tags:CRF, Marginal Probability, Information Entropy, Neural Network
PDF Full Text Request
Related items