Font Size: a A A

Web-based Text Mining Svm Page Text Classification Research

Posted on:2007-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:J LiuFull Text:PDF
GTID:2208360215482004Subject:Business management
Abstract/Summary:PDF Full Text Request
With the development of Internet-especially the dissemination of World Wide Web in the world, the information resource has contained many aspects of community exist. Information Overload stands out increasingly, which impels the rapid development of the Web mining and the Web information search.The Instrument that is used abroad to deal with Web information is search engine. At present, search engine which bases on key words is used most. But it can't come up with the hope of people in the practical application, which exits the problems that the number of returning documents is too large, the pertinence of motif is small, and the precision and recall is bad. To solve the problem, data mining is brought forward. Data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information, which can combine the classical data mining and Web. It has become an important investigative domain. Web data mining can take out the knowledge that hides in the data automatically and intellectively. It fetches up the deficiency of classical search engine.Classify is an important method to process a great deal data. Web automatically classification is an important investigative domain. Automatically classification can not only constitute corresponding database according the sort information which can improve precision and recall but also apply the information catalog to consumer. Text classification classifies the text of natural language into one or more regimentation that defines beforehand according to the content, which is an important artifice that organizes and manages information.Support Vector Machine (SVM) is a new promising machine study arithmetic, which was brought by Vapnik and his AT&T lab. It can be used in mode identify, regression estimate, probability consistency function estimate. In mode identify, SVM has exceeded classical study arithmetic for manuscript numbers identify, sound identifies, image of human face identify, text classify imprecision. SVM adapts to processing of Web text information. It has many excellences, which make it suitable to Web text processes. SVM is lay store by researchers as a good method that can be used to automatically classify Web information. It focuses on rule of machine study in small swatch and has upper capability. SVM has become new research hotspot after Nnet. It will promote the process of machine study.This article expounds the theoretic of data mining, gives a common management process of Web text mining, designs a data mining system that bases on Web which consists of document collection, character distill, and mining. After that, it introduces Statistical Learning Theory and discuss deeply SVM that bases on Statistical Learning Theory. Lastly, it apply SVM to Web text mining to classify Web text, and focus on a sort of initiative study, which can improves efficiency availably on the precondition of keeping classify capability. The result indicates that SVM has a bright foreground in Web text mining.
Keywords/Search Tags:Web mining, text mining, SVM, Web classification
PDF Full Text Request
Related items