Font Size: a A A

Optimization Of Text Classification Algorithms In The Financial Field

Posted on:2020-08-28Degree:MasterType:Thesis
Country:ChinaCandidate:X W WangFull Text:PDF
GTID:2428330590950629Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of financial industry,people are increasingly demanding financial-related information,and the information texts in the financial field are also increasing.Financial domain information texts often help to analyze the movements of related stocks and company stock prices.However,the increasing number of financial domain information texts are confusing,flooding a large number of non-financial domain texts,such as advertising texts,soft texts,pure technical texts,and so on.To this end,it is important to analyze the relevance of the text to the financial field.The text classification method of the base version is limited by the size of the training corpus,and the text is modeled based on the dimension of the word,ignoring the semantic information,and the accuracy rate and the recall rate are relatively low.Therefore,the paper proposes to improve the text classification method of the base version.Firstly,using the rules based on keywords and patterns recall text to generate training corpus.Secondly,the method based on active learning and clustering is used to mark the text to generate the training corpus.Then the text is cleaned based on the two dimensions of the text content and the media account to select high quality training corpus.Finally,the word vector feature with semantic information is introduced into the text classification feature to model the text,and different text classification models are adopted.Conduct a comparative experiment on text modeling,and make experimental adjustments to the model prediction probability,so as to more accurately determine whether the text is related to the financial field.At the same time,in order to recall more relevant texts in the financial field,the improved version incorporates a rule strategy based on keyword recognition in the financial field before the text classification model strategy is identified.The experimental results show that expanding the training corpus,retaining high-quality training corpus,the word vector with semantic information is used in the feature of text categorization,and rule method based on keyword recognition in financial field can greatly improve the recall rate and accuracy of text classification.After the discrimination in the financial field,it is possible to more accurately retain the information texts that are more relevant to the financial field.Not only greatly reduces the cost of manual filtering,but also greatly enhances the user's reading experience.
Keywords/Search Tags:Financial field, Recall text, Text classification, Semantic information
PDF Full Text Request
Related items