Font Size: a A A

Some Empirical Research On Statistical Machine Learning

Posted on:2020-07-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y C ZhengFull Text:PDF
GTID:2428330572497009Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Statistical machine learning is a new interdisciplinary subject.It is a part of natural science that studies how to capture the intrinsic characteristics of things through historical data and presents them in the form of models or algorithms to realize data analysis behaviors,such as classification,prediction and regression fitting.Statistical machine learning has a wide range of applications.This paper focuses on the application of the classification model in the fields of finance and text recognition,as well as the improvement of the variable selection(dimensionality reduction)method for the classical classifier called support vector machine.The details are as follows:(1)Logistic regression model is used to establish a binary model for customer trading indexes,in order to predict customers with high risk of churn in advance for securities companies.As the China's economy is moving forward and the deepening of economic globalization,the problem of customer churn requires more attention from securities companies than the competition for customers.In this part,k-means clustering is applied to obtain the customer churn status based on the indicators reflecting the customer transaction status.Then 6 stepwise regression methods were used to select variables,and the logistic customer churn warning model is established.Then the generalization ability of the model is tested and analyzed based on the business characteristics of securities companies.The results show that the indexes of customer transaction activity is the key to the implementation of customer churn warning in securities companies,so as to provide effective methods and feasible suggestions for securities companies to retain customers specifically.(2)Multi-classification model is the extension of binary classification model,so it can be made use of in wider range of areas.In this part,a multi-classification model of support vector machine(SVM)is modeled to identify 26 kinds of English characters,regardless of different handedness,font and other printing styles.As one of the field in image recognition,handwritten data recognition has been widely used in many fields such as mobile intelligence,criminal investigation,medicine and archaeology.In this part,a support vector machine model for English character recognition is modeled based on the statistical machine learning theory,and the classical handwritten character data set in the field of statistical machine learning is applied.The empirical results show that the recognition accuracy of "variant" English letters is very high and very stable without "overfitting" phenomenon.(3)The performance of the classical multi-classification model can be further improved.This part uses the variable selection method,called Elastic Net(EN),to optimize the support vector machine,and models the same handwritten English character data set as in(2),so as to give the algorithm better properties.In this part,the multi-classification model of SVM is modeled by using the indexes after variable selection of Elastic Net.In order to compare the capacity of dimension reduction,ridge-SVM,lasso-SVM and PCA-SVM were modeled.At the same time,in order to be objective and comprehensive of evaluating performance of models,not only to introduce multiple evaluation indexes,such as the classification accuracy,training time,index number,but to take the advantage of numerous other image recognition classifiers,including neural network,decision tree,random forests,logistic regression,discriminant analysis,the k-means clustering and so on.In other words,models used in this experimental part cover classical statistical models and statistical machine learning models,supervised and unsupervised models.The results show that the Elastic Net Support Vector Machine(EN-SVM)is suitable to English character recognition because it realizes the variable selection(dimensionality reduction)and acquires excellent properties of Elastic Net to handle with large sample size,high dimension,sparse modeling of text data at the expense of relatively smaller classification accuracy and extension of model's training time.
Keywords/Search Tags:Customer churn warning analysis, Logistic regression, Handwritten English character recognition, Support Vector Machine, Statistical machine learning, Variable selection, Elastic Net Support Vector Machine(EN-SVM)
PDF Full Text Request
Related items