Font Size: a A A

Text Classification Applied Research In The Discipline Navigation

Posted on:2008-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:C H ZhangFull Text:PDF
GTID:2208360215469437Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the development of the Internet techniques, the information on the internet increases exponentially. Some information are important, but more are waste. As the key technology in organizing and processing large mount of document data, text classification can solve the problem of information disorder to a great extent, and is convenient for user to find the required information quickly.Building the Net Resources Subject Guide is one of the important functions and one of the effective organizing and utilizing net-information. As the increasing of the net-information and homepage in quantity, artificial classification is the most barrier of building subject guide. Text classification can solve this problem. The paper mainly discusses from several following respects:1) Introduce the study situation at present of Text Classification and the development of the subject guide, and study the basic conception, classification ability, the feasibility and the effects.2) Introduce the process of text representation. This thesis gives one split-word algorithm basis on Biggest Match Method, and employs it in Geology Engineering Subject Guide. This split-word algorithm retains advantages of Biggest Match Method, and guarantees the extension of the split-word-dictionary.3) This thesis introduces basic conception about SVM and KNN, and employs them in subject guide. Study the classification precision,classification speed and the algorithm extension. Experimental results show that SVM is faster than KNN, and KNN's extension is bigger than SVM. More, KNN's precision is differ from SVM.4) By comparing of the experimental results of SVM and KNN, this paper gives that SVM-KNN employs subject guide. Experimental results show that SVM-KNN joins the advantages of SVM and KNN, and it is the best algorithm in solving subject guide classification at present.
Keywords/Search Tags:Text classification, Chinese split words, Subject Guide, Support Vector Machine, K-Nearest Neighbor
PDF Full Text Request
Related items