Font Size: a A A

Research On Support Vector Machine Classification Algorithm For Multi-class Texts

Posted on:2020-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:X W SongFull Text:PDF
GTID:2428330590450992Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In this era of digital information,how to quickly and accurately find effective knowledge from a large amount of text content becomes a problem that needs urgent solution.Among them,multi-class text classification is the key technology to solve this problem.The old text classification method which is manual classification is not suitable for dealing with the current text classification problem.The main reasons are as follows: the accuracy of classification results is not high;it takes a lot of time;the staff who require text classification duties have relevant operational capabilities.To summarize the above reasons,we need to explore a more efficient way of text classification to get rid of the limitations of traditional classification methods,so automatic text classification technology came into being.The basic principle of automatic text categorization is: for the known text content,the corresponding text categorization technology is used to mine and analyze the data,and the unknown text content is predicted by the machine learning method.The text classification method based on machine learning is an important part of data mining.However,the traditional statistical learning method is difficult to achieve due to its principle conditions,so it can't achieve good results in reality.In the 1990 s,Vapnik et al.proposed a new machine learning method based on statistical theory—Support Vector Machine(SVM).Support vector machine is based on the principle of structural risk minimization,and has good performance in solving small sample,nonlinear and high dimensional vector space,and because it does not involve probability measure and law of large numbers,and the result is only determined by the support vector.Therefore,the support vector machine can better eliminate redundancy and has better robustness.The main features are as follows: the ultimate principle of the algorithm is to find the overall optimal solution,avoiding the problem of "local minimum point";the traditional machine learning method is based on the minimum of empirical risk,and the support vector machine algorithm is based on structural risk minimization,so it has higher scalability;for the solution of nonlinear problems,it is mainly to map to high-dimensional space through nonlinear functions,avoiding "dimensional disaster".Because of these advantages of support vector machine,it has been widely used in handwritten digit recognition,face recognition,text classification,large-scale biological information processing etc.It is a research hotspot in the field of machines in recent years.The basis of the support vector machine algorithm is the principle of structural risk minimization,and it has better performance ability in solving problems with small sample,nonlinear and high dimensional characteristics,but the support vector machine is originally designed to deal with two types of classification problem,so there are still some problems in the processing of multi-class classification problems.After summarizing the above reasons,the researchers improved the traditional support vector machine algorithm after deep understanding of the basic theory of support vector machine,so that it can better solve the multi-class classification problem.There are two main ideas for SVM to solve multi-class classification problems: the first one is a one-time solution;the second method is to define multiple two classifiers,according to the different characteristics of the classification problem to be solved,the optimal two different classifier combination methods are adopted.Commonly used multi-class classification support vector machine algorithms are: "one-to-one","one-to-many","error correction output coding","directed acyclic graph" and "binary tree".According to the research,the second method has better overall performance and can better deal with multi-class classification problems.Among them,the binary tree SVM algorithm has better comprehensive performance in multi-class classification algorithms,but there are still problems such as "error accumulation" and dependence on the binary tree structure.Aiming at the above shortcomings,this paper proposes a binary tree construction algorithm based on class separation metrics based on the in-depth study of SVM multi-classification algorithm,and applies it to the multi-classification problem of college students' comprehensive quality evaluation.The experimental results prove The effectiveness and practicability of the algorithm.The main research work of this paper is as follows:(1)In-depth study of text mining theory,including text mining basic principles,text mining tasks and key techniques of text mining,lay the foundation for the following research.(2)Further analysis and research on support vector machine theory,focusing on several common support vector machine multi-classification algorithms,and analyzing their shortcomings.(3)Aiming at the shortcomings of binary tree multi-classification algorithm,a binary tree construction algorithm based on class separation metric is proposed.The algorithm can greatly reduce the error accumulation,and at the same time,the generated binary tree is generally a partial binary tree,and the local is a complete or nearly complete binary tree structure,which can improve the classification efficiency.Experiments show that the improved algorithm has better classification effect.(4)In order to verify the practicability of the improved algorithm above,this paper aims at the multi-classification problem of college students' comprehensive quality evaluation,and uses the improved algorithm to design a comprehensive quality evaluation model for college students.Application results prove the effectiveness and practicability of the improved algorithm.
Keywords/Search Tags:multi-class text classification, support vector machine, binary tree, class separation metrics, evaluation of college students' comprehensive quality
PDF Full Text Request
Related items