Font Size: a A A

The Research And Implementation Of The Chinese Document Automatic Categorization Based On Web

Posted on:2008-08-19Degree:MasterType:Thesis
Country:ChinaCandidate:N ZhangFull Text:PDF
GTID:2178360218452473Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As information on Internet is available in abundance, it is becoming a vital source of knowledge getting. But information is too much to look up valuable one efficiently. So it is very important to neaten the information on Internet. This dissertation focus on Chinese web document automatic categorization, which is the core technology for Internet search engine, information filtration, information search, text database, and digital library.The text categorization is to categorize the text according to the attribute (content) under the certain categorization system. Generally, the text categorization should be support the training set. The training set is a collection of the text, which is made up of text categorized already (given certain type symbol). And every type should include certain number training text, according to the categorization system setting. The type selector should be trained by a certain studying method before used to select unknown text. The text categorization technology can provide fine support to the information organizer, and can satisfy the information search better. The level of the technology will directly influent the efficiency of the searching.The character matching and document categorization arithmetic of document automatic categorization are studied in this paper. A method is proposed based on multithreading to achieve the parallel open document automatic categorization. Several document categorization arithmetic are integrated into one that can be run by single or together. An open interface can be designed for all arithmetic, which can add new document categorization arithmetic and delete the outdated. The method of character matching also uses the same one. As a result, the compatibility and veracity of the document categorization system will be improved a lot. And at last, the whole document automatic categorization system is developed to check the method proposed in this paper.
Keywords/Search Tags:document automatic categorization, character matching, parallel, multithreading
PDF Full Text Request
Related items