With information technology, particularly Internet technology increasing rapidly, the human being has entered into a diversified age with its advanced information. In this era, people can have the access to abundant data, text, sound and image by the way of Internet, intranet, and electronic library. However, to obtain that information briefly, easily and effectively is of some difficulties. As a result, the automatic classification, especially the automatic webpage classification becomes increasingly important, which could save time in clearing up the files and enhance the efficiency of information capture. That is also convenient for people to retrieve information, and save files as well.This thesis is to study the development and current situation of automatic webpage classification technology, and find out the pros and cons of the present search engine system. An analysis of the system development language Java and development technology Swing and the TF-IDF algorithm, the author tries to find out a design scheme on automatic webpage classification algorithm. After some relative tests, this method could meet the demand in large scale of Chinese webpage automatic classification with the accuracy rate over80%on average, which is of great practical value. |