Font Size: a A A

The Research Of Web Text Classifier In The Digital Library

Posted on:2006-08-25Degree:MasterType:Thesis
Country:ChinaCandidate:C GuoFull Text:PDF
GTID:2168360182455180Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the computer and Internet, online information types are more and more abundant, the usable resources are more and more abundant too, and these become the digital library the development power. At the same time, network memory and exchange technology development, also has led the digital library's correlation technology researches gradually. The digital library is a developing computer application domain, which involves to the Internet, multimedia, data warehouse, data mining and the intellectual property rights power protection and so on. So the application and the commercial prospects are used broadly.This article is under the environment of the digital library system, do some researches about effectively classifying texts of the webpage saved in the resources storehouse. At the first stage of studying, the overall demand for digital library system is analyzed conscientiously. We find out one of the crucial problems of the system, which is categorized webpage texts kept in the resources storehouse, passing the description drawing of its demands, data flow chart and system module picture. Thus the thesis centers on this subject to launch the plan design.The chapter three is mainly introduced the essential technology of text classification, including the text pretreatment technology and the characteristic extracts technology, thus obtains the characteristic set of the text. We analyzed the KNN algorithm and all other six commonly used algorithms, and compared their characteristics and chosen KNN algorithm finally. At the same time we also introduced the appraisal quotas of the classifier's performance. After that the homepage information automated extracts are described, and then design a flow chart. The fifth chapter is the overall system plan design, including the system structure drawing, the module chart, the classified system design, involves several algorithms and the overall flow chart, and then point out the meaning of the research work, which is differ from the classifier of the mercantile search engine.
Keywords/Search Tags:digital library, text classification of webpage, classify algorithm, classifier
PDF Full Text Request
Related items