Semantic Matching Between Internet Media Documents And Stocks Based On Extreme Multi-Label Classification

Posted on:2023-05-16

Degree:Master

Type:Thesis

Country:China

Candidate:L R Li

Full Text:PDF

GTID:2558307070453934

Subject:Library and Information Science

Abstract/Summary:

PDF Full Text Request

How to effectively extract the massive media information in the Internet is a work of great significance for financial market investors and financial researchers.This study focuses on the basic work of semantic matching between Internet media documents and stocks.Systematically combs the relevant studies,it is found that the specialized research in this area is not very deep and complete,and the semantic information extraction depth is not deep enough.Based on this,this study starts from the existing theories and technologies in Information Science,explores the method of deep semantic matching between Internet media documents and stocks at the semantic level,hoping to provide a solid foundation for financial market investors and researchers in the field.This study maps the semantic matching between Internet media documents and stocks into the question of extreme Multi-Label classification.Taking Transformer pretrained language model,a cutting-edge research achievement in the field of NLP,as the main technical means,this study carries out the experiment by drawing lessons from the "Indexing-Matching-Ranking" three-stage X-Transformer model.In the "Indexing",stocks are divided into 10 asset categories by using the descriptive information of individual stock assets,and the mapping relationship between stocks and asset categories is obtained;In "Matching",firstly,on the basis of literal matching results,the individual stock asset co-occurrence relationship information is innovatively introduced as the external rule item of data annotation,which greatly reduces the time and labor cost of data annotation;Then,based on the mapping relationship obtained in the "Indexing",the Transformer pre-trained language model is used to train the multi label classification model from media information to asset classes;In the "Ranking",based on the asset class matched by the media information,the Liblinear classifier is used to find the most matching stock assets.The experimental results are tested by accuracy,recall and F1,and the model shows good performance.Combining the literal and semantic matching results,the evaluation indicators were further improved.The comparison experiment of multi label classification based on direct Transformer model and the semantic-level verification experiment of the correlation measure between stocks correlation matrix based on semantic matching and stocks correlation matrix based on log return mutual information are carried out.The results strongly verify the effectiveness of X-Transformer model in semantic information extraction and extreme multi label classification tasks.The research provides an effective supplement and reference method for the research on the matching of Internet media documents and stocks,and also provides an important basis for the decision-making activities of financial investors.

Keywords/Search Tags:

financial media information, stock assets, extreme multi-label classification, Transformer pre-trained language model

PDF Full Text Request

Related items

1	Research On Multi-label Text Classification Based On BERT
2	Research On Extreme Multi-label Text Classification Based On Label Knowledge
3	Research On Extreme Multi-label Classification Based On Parallel Label Trees
4	Research On The Improvement Of Multi-label Text Classification Algorithm For Offensive Language In Social Media
5	Research On Sentiment Classification Of Weibo Based On Pre-trained Language Model
6	Research On Knowledge-Enhanced Pre-trained Model Based On Graph Transformer
7	The Research And System Design Of Multi-Classification Of Text Sentiment Based On Pre-Trained Models
8	Research On Multi-label Image Classification Based On Multi-scale Feature Enhancement
9	Research On Label Coding Algorithms For Multi-label Classification
10	Multi-label Prediction Model Based On Ontology Database And Data Mining In Bio-medicine