Font Size: a A A

Research Of The Automatic Chinese WEB Text Categorization In Search Engine

Posted on:2008-09-15Degree:MasterType:Thesis
Country:ChinaCandidate:H W LiuFull Text:PDF
GTID:2178360215495802Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Along with the fast development in network information, the search engine complied with the tidal current lives, plays the pivotal role in the network information retrieval. When use search engine, we always hoped that we can obtain a quicker speed and a higher precision. Through automatic text classification, it can improve the efficiency of search engines. Therefore this paper we focus on related technologies about the Chinese automatic classification, thus promoting the development of information technology.This paper introduced a search engine, the principle and structure, and then research several key technologies such as Chinese word segmentation, feature extraction and classification algorithms. Through the analysis current basic several algorithms of Chinese word segmentation, according to the Chinese own characteristic, we proposed one kind of algorithm base on 2-Gram model and the Hash mechanism. Meanwhile, several kind of commonly used and popular text classification algorithm has conducted the comparison research. Combined with some previous research experience, we propose a model based on VSM and KNN algorithm for Automatic Text Classification System program. Finally, we carry on the summary and the forecast to the Chinese text classification related technology.
Keywords/Search Tags:Search engine, Chinese Word Segmentation, Feature Selection and Extraction, Categorization Algorithm, VSM-KNN
PDF Full Text Request
Related items