Font Size: a A A

Research On Web Text Classification Based On Support Vector Machines

Posted on:2013-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:J HuangFull Text:PDF
GTID:2248330371988910Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
With the rapid popularization and development of the Internet network, Information Overload is becoming increasingly prominent.How to effectively organize and manage these vast amounts of information to help users get the information they need to quickly and accurately is a challenge of today’s information science and technology. Web text classification is a very important information organization and management tools which can put every text in the Web text set into a class.In many classification algorithms, SVM (Support Vector Machine) has become a hot research topic.In this thesis,around the research on classification of Web text based on support vector machines, the main works researched is as follows:(1)The paper introduces the significance of the research situation, especially the Chinese text classification,feature selection,vector space model,some algorithms of text classification,the common criteria of evaluation for text classification.(2)Feature selection technology is an essential part of text categorization, which affects directly precision of classification.In this paper,a comprehensive analysis of the characteristics of text classification on the basis of selection methods.To overcome the shortcomings of traditional χ2approach, we put forward a new feature selection method with a comprehensive analysis of text classification on the basis of selection methods.This paper takes criterions such as qualified frequency αw distinguishing categories βw and distribution of within-class θw. This thesis gives experiment data to prove the improved χ2approach is effective and feasible.(3)VSM in web text classification has been widely recognized, but his disadvantage is that the text feature selection and feature representation is relatively independent.The importance of characteristics from caculation of feature selection function should be combined with feature weights.We take term frequency TF, inverse document frequency IDF, and feature selection to construct a new term weighting method.In this paper, respectively, proposes an improved method on feature selection and term weighting to improve the performance of text classification. (4)Based on grasping the basic theory of SVM, significance and research situation of support vector machines in home and abroad,a text classification model is established based on SVM.SVM shows many attractive features and emphatic performance in the fields of sample,nonlinear and high dimensional pattern recognition.Compared with traditional method of classification,experimental results show that SVM has better perfermance than K-Nearest-Neigbor,Naive Bayes and so on.
Keywords/Search Tags:Web text classification, Support Vector Machine, feature selection
PDF Full Text Request
Related items