Font Size: a A A

Study Of Document Organization Method Based On Topic Map

Posted on:2007-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:H Y TianFull Text:PDF
GTID:2178360182483811Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development and wide application of computer and Internet, information age is coming ad hoc abundant electronic scientech documents provide much convenience for people's scientific research. But simultaneously the flooded out-of order, unstructured electronic documents also result in difficulties of document retrieval, so making an effective organization of them with that becomes an important precondition to improve the efficiency of document retrieval.Traditional organization methods couldn't exhibit the underlying relationships between document contents, while the new organization technology-Topic Map(TM), which is called electronic index, can be used to solve this problem. It marries the best of traditional methods and can be used to organize documents by topics and their associations. So, this paper mainly talks about its application to document organization.Key work of this paper can be summarized in the following three parts:(1) Based on concepts topic, association, occurrence of TM, a multilayer TM-based document organization model is proposed. In this model, topics, which are defined on different layers, are generalized from document contents, and associations between topics are relationships between document contents.(2) Creating process for this model is analyzed and summarized in three modules: document representation, document clustering and TM creation. For document clustering, a similarity calculation algorithm between documents is proposed based on extended-Boolean model and a multistage-clustering method is proposed as the extension of classical agglomerative clustering method to generalize topic concepts.(3) Application of this model to document retrieval is discussed which includes two aspects: navigating browse and criteria query based on semantic similarity calculation.In the end, an experiment is carried out with 252 documents chosen from domain information retrieval. A 4-layer TM is created and based on it a document retrieval system is developed. Results show that applying TM to document organization can improve the efficiency to a great extent and provide convenient and flexible navigation for users.
Keywords/Search Tags:Topic Map, TMDOM Model, Document Organization, Document Retrieval, Document Clustering
PDF Full Text Request
Related items