Font Size: a A A

Research And Application Of Text Processing Techniques Based On Machine Learning

Posted on:2016-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:S XueFull Text:PDF
GTID:2298330467992620Subject:Military communications science
Abstract/Summary:PDF Full Text Request
With the rapid growth of the information, how to classify large set of natural language text according to the semantics into proper class has become a key issue of organizing text. Support vector machine is a machine learning technology proposed by Vapnik, which is a new tool. with the help of optimization methods. It integrates a number of technologies, such as maximum margin hyperplane, Mercer kernel, convex quadratic programming, sparse solution, slack variables. Because of its global optimum, simple structure, strong generalization ability, in recent years has been widely studied and applied to text classification, pattern recognition, and other fields. Based on the background, the main work of this article is as follows.Firstly, this paper introduces the key technologies in all aspects of Chinese text classification pretreatment process and elaborates the main machine learning algorithm for text classification. Focuses on the relevant technical of support vector machine. After analyzes and summarizes the mainstream improved algorithm of support vector machines, proposes an improved algorithm onĪ½-SVM to overcome the offset error when positive and negative training set have large gap. A regulatory factor is introduced to ensure that when a negative number is greater than the number of positive class, class predictive of positive and negative categories and classification capabilities rather, so that we weak offset phenomenon caused by a different number of classes. Simulation results show that compared to the original algorithm, the improved algorithm meets an improvement on the accuracy of positive class.Secondly, this paper make a deep study on multi-class support vector machine classification algorithm. We focus on the building strategy of binary tree multi-class support vector machine In this paper, we take into account the distance between the class centers and degree of dispersion of each class, and propose an improved algorithm based on binary decision tree to solve multi-class classification problem. Experiments are conducted on news texts and the results are compared with traditional methods such as one-against-one and one-against-all. The results shows that in general, the proposed method is better than the traditional methods, and makes an great improvement on training and testing time.
Keywords/Search Tags:Support Vector Machines, Text Classification, SVMbased on decision tree
PDF Full Text Request
Related items