Font Size: a A A

The Web Text Mining Based On Support Vector Machine

Posted on:2008-06-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y M DingFull Text:PDF
GTID:2178360215451523Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the Internet and other information technologies's development and widely used, the Web has become one of the most important means to obtain information for people. Internet provided a sufficient abundance of information, but only a few is useful for people,It is very urgent to find how to search and classify the document quickly and exactly from the huge information database. This makes researcher to find the method to introduce the data mining technology into the Web and it will be open up new areas of the Web data mining.Support Vector Machine technology is based on statistical learning theory and the theory of VC-dimensional and structure of the smallest risk, by the limited sample of the information in the model to make a compromise between the complexity and learning ability, with a view to obtaining the best outreach capacity. It was devoted to the study of a limited sample set, the algorithm will eventually be transformed into a quadratic optimization problem. Through nonlinear transformation to a high-dimensional feature space, which guarantee the machine have a better promotion, It also cleverly solved the dimension disaster.This paper first research the content of the web mining, research the technology used and problems existed of the web content mining, web structure mining andusage mining. Subsequent discuss the text classification.Then,we give a sum-up of the process of the text classification,then analysis thealgorithm which used in text classification deeply, present a feature selectionmethod based-on Semantics, use the usally evaluated in text classification such asPrecison,Recall e.t,shows the text feature selection base-on semantics have a goodperformance.Analyse SVM technology theoretical, discussion of the limited sample set, theSVM technology in the text classification advantage.For the many sorts textclassification's problem, analysis them deeply and give the methods to resolve.Finally, Use the Support Vector Machine technology to complete textclassification test, discuss the polynomial kernel parameter's change may effect onthe work of text categorization.
Keywords/Search Tags:Web mining, Text categorization, Support vector machine, Semantic
PDF Full Text Request
Related items