Font Size: a A A

The implementation of dynamic document organization using the integration of text clustering and text categorization

Posted on:2007-09-11Degree:Ph.DType:Dissertation
University:University of Ottawa (Canada)Candidate:Jo, TaehoFull Text:PDF
GTID:1448390005970196Subject:Computer Science
Abstract/Summary:
A document organization is a collection of documents composed of labeled clusters that contain similar documents. In any information system, a collection of documents always changes as time goes, since users access the collection to delete, add, and update documents. Dynamic Document Organization is a document organization that adapts automatically to such variable document collections. DDO poses two challenges, because of the decentralized mode of access. First, some clusters may have many documents, while others may have very few. Second, documents belonging to new topics may be added to the information system very often. Considering these two points, we need to reorganize the collection of documents, even if it was organized previously. Both text categorization and text clustering are limited when implementing DDO (Dynamic Document Organization) individually. Text categorization requires the manual preliminary tasks of the predefinition of a classification system and the preparation of sample labeled documents. Text clustering generates only unnamed clusters alone; each cluster should be labeled, manually by scanning contained documents. Therefore, this dissertation proposes approaches to the implementation of DDO that combined text clustering, cluster identification, and text categorization.
Keywords/Search Tags:Document, Text clustering, DDO, Collection
Related items