Font Size: a A A

Application For Web Text Categorization Based On Support Vector Machine

Posted on:2011-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y DuanFull Text:PDF
GTID:2178360305482900Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As Web is one of the most important applications on the Internet, which provides convenience to document releasing and information access, information resources everywhere gathered on the Internet has become indispensable in our lives. According to official data, the Internet has more than 100 millions of Web documents. In the face of such a large mass of information, Web users are often difficult to choose what the information they need, so they are urgent to find a way to quickly locate the useful information. Because of the users'demand is increasing, data mining technology on Web emerged. But at this stage, Web data mining is mainly built on information retrieval, data mining and knowledge management. Through analysis of a large number of Web documents to obtain the hidden knowledge and patterns, people will make better information retrieval.With the development of Web data mining technology, today's text categorization technology can improve the status of text message disorder and reduce the query time, improve search quality, and get access to text message fast and efficiently. Therefore, automatic techniques of text categorization obtain more and more attention. Text categorization based on machine learning has been achieved very satisfying results. A lot of classification algorithms are proposed, such as the KNN algorithm, Naive Bayes algorithm, decision tree algorithm and support vector machines.This paper mainly describes that Web data mining Chinese text categorization technologies and shows the processing of Web text categorization in details:text preprocessing, feature dimension reduction, presentation method of text features, probes into the application of support vector machine (SVM) categorization algorithm in text categorization, and focuses on support vector machine in the combination with Minimizing Bayes errors rate to construct a multi-category model of Web text categorization and its concrete construction process. Experiments show, under the condition of ensuring the performance of the classifier, selecting the training data samples for training, its experimental results can achieve better precision comparing with traditional SVM algorithm, and has a higher running efficiency.
Keywords/Search Tags:Web Data Mining, Text Categorization Technology, Minimizing Bayes Errors Rate, Support Vector Machine (SVM)
PDF Full Text Request
Related items