Font Size: a A A

On The Design And Implementation Of Automatic Webpage Classification Algorithm

Posted on:2013-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:S X LiuFull Text:PDF
GTID:2248330395467634Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With information technology, particularly Internet technology increasing rapidly, the human being has entered into a diversified age with its advanced information. In this era, people can have the access to abundant data, text, sound and image by the way of Internet, intranet, and electronic library. However, to obtain that information briefly, easily and effectively is of some difficulties. As a result, the automatic classification, especially the automatic webpage classification becomes increasingly important, which could save time in clearing up the files and enhance the efficiency of information capture. That is also convenient for people to retrieve information, and save files as well.This thesis is to study the development and current situation of automatic webpage classification technology, and find out the pros and cons of the present search engine system. An analysis of the system development language Java and development technology Swing and the TF-IDF algorithm, the author tries to find out a design scheme on automatic webpage classification algorithm. After some relative tests, this method could meet the demand in large scale of Chinese webpage automatic classification with the accuracy rate over80%on average, which is of great practical value.
Keywords/Search Tags:automatic webpage classification, webpage content extraction, automatic text classification
PDF Full Text Request
Related items