Font Size: a A A

Knowledge Of The Semantics Of The Document Retrieval Method

Posted on:2012-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:B Y QiFull Text:PDF
GTID:2208330332992835Subject:Computer applications
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology today, companies are constantly promoting the digitalization process, and they are continually accumulating a large number of e-knowledge documents. But, the documents are scattered in various departments and business units, they are not treated in an ordered manner nor used efficiently. There is a very awkward situation because of repetition work and poor efficiency: On the one hand, there is a lot of documents, on the other hand, it's difficult for us to get the knowledge that we really need.Knowledge within an enterprise is always domain-dependent and always professional and accurate. Therefore, general users find it hard to get these documents with natural language, hence these resources are buried deeply and can not efficiently practically used.Therefore, the following two problems become an important challenge of Knowledge Management:(1) how to better manage these documents, annotate them with subjects and store them in good format? (2) how to make a bridge between experts and ordinary users so that users' don't need to use strict words to query the documents and still could get satisfactory results?This thesis proposes a semantic search method for the documents labeled with a thesaurus. The main idea is as follows. Firstly, we build a more complete thesaurus structure of the domain, and then use it to annotate the documents. After that, we build a two-level index structure to make the things different:the first level is from the thesaurus meta-level elements to the thesaurus, and the second level is from the thesaurus to the documents. For the user's query, we firstly calculate the semantic similarity between the keywords and the thesaurus on the first-level index structure, and then retrieve the proper documents using these semantic terms in the the second-level index structure.This thesis also proposes a method to customize the thesaurus in order to detach the search condition by detecting the domain they belongs to. It can make profound convenience from switching frequently between different conditions.To enrich the thesaurus, the thesis uses CRF++ as a tool which is aidded by a post-processing technique. Experiments show that the combined approach achieves a better result.The current knowledge-document search system has been implemented and deployed in a company.
Keywords/Search Tags:knowledge document, semantic search, two-level index structure, thesaurus, thesaurus construction
PDF Full Text Request
Related items