Font Size: a A A

Research And Implement Of Topic Oriented Web Page Classification Technique

Posted on:2012-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:F WuFull Text:PDF
GTID:2218330362956263Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Search engine is the most commonly network information search tool at present, people dependent on it for searching information. The most search engines take the search strategy that base on keywords matching, but with the increasing amount of information on internet, the implementation effect of this search strategy has been greatly affected. In order to let search engine can accurately locate when it search internet information, and improve the correlation between the target and the information which is searched, web page classification technology is used to assist search engine in searching network information, in order to optimize search engine search effect.Web page automatic classification technology develops on the basis of text automatic classification technology. Essentially, web page automatic classification system is a system achieve by natural language processing combine with machine learning principle. The classifier is web page automatic classification system core.The article describes several currently mature and popular classification algorithm, considering the actual network situation, select K-nearest neighbor algorithm to construct classifier by comparing the advantages and disadvantages, use the classifier to determine the type of the specified map unknown text. In this article, the classification system is designed and completed on the basis of studying the structure and characteristics of Chinese Web page, and show the classification system construction process step by step, I study the several important compositions of this classification system, namely text preprocessing, topical feature extraction, establish feature library, category measure, and experiment in a real network environment. Take the measure of combining with search engines to concrete realize the system, extract feature from the pages content crawl by search engines, establish feature library, make category measure. Finally, according to currently widely used detection indicators to measure the classification accuracy of the system classifier.Finally I make purposive experiment on some websites , a number of experimental data, evaluation criteria are provide to prove the effectiveness of the system, feasibility, describes the automatic classification technology can be used to optimize the accuracy and relevance of search engine mining network information.
Keywords/Search Tags:Web Page Classification, Topical Classification, K-Nearest Neighbor Algorithm, Feature Extraction
PDF Full Text Request
Related items