Font Size: a A A

Credit Rating Model Incorporating Textual Data For Listed Companies

Posted on:2023-09-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z P ZhangFull Text:PDF
GTID:1529307031477974Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Listed companies play an important role in China’s economic development and social prosperity.As of June 2021,there were more than 4,000 listed companies in Shanghai and Shenzhen stock exchanges in China,with total operating income exceeding RMB50 trillion and total stock market value approaching RMB80 trillion,making it the second largest stock market in the world.A reasonable credit rating system needs to be established urgently.In August 2021,the People’s Bank of China,the National Development and Reform Commission,and other three ministries and commissions issued the document that clearly pointed out that credit rating agencies should "form a rating system with reasonable differentiation".In 2020,China’s social financing scale increases as much as 34.79 trillion yuan,of which RMB loans increase by 20.03 trillion yuan.If the loan losses are reduced by 1%,more than 200 billion yuan of credit losses can be recovered.When rating the credit of listed companies,this study integrates the textual information of the annual report of listed companies.This study constructs textual features based on the annual report text,including textual vocabulary features such as "innovation" and "bad debt",and textual intonation features such as "tone".It forecasts the default together with non-textual features,and then gives the credit rating of listed companies.Among them,text-based features,as the frontier and hot spot of current academic research,reflect the characteristics of data source diversity in big data environment.The research on the credit rating of listed companies with text data includes at least the following three scientific issues:The first is to determine the optimal combination of features.Default prediction model cannot be separated from the selection of features.If the features are different,the default prediction results will be different or even entirely contrary.As text features have become the frontier and hot spot of research at home and abroad in recent years,how to select all features including textual features and non-textual features is an urgent scientific problem in feature selection.Second,the assignment of textual features."Text" is typical unstructured data,and only through the assignment of textual features can the text be transformed into structured data for modeling.Different ways of assigning text features will lead to different default discrimination abilities,and the performance of default prediction models are correspondingly different.Third,the division of credit rating.The credit scores of listed companies are continuous decimals,which are not easy to be grasped by people,so they need to be divided into different intervals(i.e.credit grades).Because there are infinitely rational numbers between two points,the adjustment of any critical point of credit rating will change the number of enterprises in adjacent grades.Therefore,finding the optimal cutoff point for each credit grade is the key scientific problem of credit grade division.Focusing on the above three scientific issues,the main innovations and features of this study are as follows:(1)In determining the optimal combination of features,a 0-1 programming model is established based on whether features are selected as decision variables,which deduce the optimal combination of features to ensure the default identification capability of the whole combination of features.The 0-1 programming model is constructed to select the optimal combination of features by taking the maximum prediction accuracy of G-mean as the objective function and the nonrepetition of information reflected among features as the constraint.Changing the practice that most literatures only predict default based on non-textual features and fail to include text features,and changing the current situation that many literatures only select feature one by one and cannot guarantee the overall default identification ability of feature combinations.(2)In the assignment of textual features,according to the standard that the greater the difference in the frequency of textual-based feature between defaulted and non-defaulted enterprises,the greater the assignment of the feature,so as to ensure that the assignment of textual features can distinguish these two types of enterprises to the maximum extent.Through the idea that the greater the difference in the frequency of occurrence of the same word in defaulted and non-defaulted corporate documents,the greater the role of the word in default judgment,the textual feature is assigned.Changing the existing research practice that only the frequency of words appearing in all documents is used to assign values,and the distinguishing ability of default for the word is not considered.(3)In terms of credit grade division,based on the constraint that the proportion of defaulted enterprises in the next credit rating is larger than that in the previous rating,a linear programming model is established to get the optimal credit rating to ensure that defaulted enterprises are more distributed in lower credit ratings.Taking the difference of the number of defaulted enterprises between adjacent credit grades as the objective function,the linear programming model is constructed by taking "the higher the credit grade,the smaller the proportion of defaulted enterprises in the grade" as the constraint 1 and "the higher the credit grade,the lower the loss rate" as the constraint 2,so as to deduce the optimal cutoff point of credit grade.Changing the existing research does not pay attention to the distribution of defaulted enterprises,which leads to the unreasonable phenomenon that defaulted enterprises may exist in higher credit ratings.The main conclusions and findings of this study are as follows:(1)The effect of textual features on default prediction cannot be underestimated.Eleven textual features,such as "getting out of trouble","innovation" and "Tone of text" have a significant impact on the default forecast of listed companies.The weight proportion of textual features is as high as 20.85%,exceeding the impact of macro and non-financial features on default.(2)Corporate financial features are the most important elements to predict the default of listed companies.The impact of 24 financial features such as "asset-liability ratio","current ratio" and "ROE" on the default forecast was 54.66%,accounting for more than half of the total.(3)Internal non-financial features play an important role in the default prediction of listed companies.Internal non-financial features such as "number of performance forecast","equity concentration index" and "type of audit opinion" have an impact on the default prediction,accounting for 12.43%,second only to corporate financial features in terms of non-textual features.
Keywords/Search Tags:Credit Rating, Optimal Features Combination, Textual Feature, Default Prediction, Credit Grade Division, Big Data
PDF Full Text Request
Related items