Knowledge Of The Semantics Of The Document Retrieval Method

Posted on:2012-08-12

Degree:Master

Type:Thesis

Country:China

Candidate:B Y Qi

Full Text:PDF

GTID:2208330332992835

Subject:Computer applications

Abstract/Summary:

PDF Full Text Request

With the rapid development of information technology today, companies are constantly promoting the digitalization process, and they are continually accumulating a large number of e-knowledge documents. But, the documents are scattered in various departments and business units, they are not treated in an ordered manner nor used efficiently. There is a very awkward situation because of repetition work and poor efficiency: On the one hand, there is a lot of documents, on the other hand, it's difficult for us to get the knowledge that we really need.Knowledge within an enterprise is always domain-dependent and always professional and accurate. Therefore, general users find it hard to get these documents with natural language, hence these resources are buried deeply and can not efficiently practically used.Therefore, the following two problems become an important challenge of Knowledge Management:(1) how to better manage these documents, annotate them with subjects and store them in good format? (2) how to make a bridge between experts and ordinary users so that users' don't need to use strict words to query the documents and still could get satisfactory results?This thesis proposes a semantic search method for the documents labeled with a thesaurus. The main idea is as follows. Firstly, we build a more complete thesaurus structure of the domain, and then use it to annotate the documents. After that, we build a two-level index structure to make the things different:the first level is from the thesaurus meta-level elements to the thesaurus, and the second level is from the thesaurus to the documents. For the user's query, we firstly calculate the semantic similarity between the keywords and the thesaurus on the first-level index structure, and then retrieve the proper documents using these semantic terms in the the second-level index structure.This thesis also proposes a method to customize the thesaurus in order to detach the search condition by detecting the domain they belongs to. It can make profound convenience from switching frequently between different conditions.To enrich the thesaurus, the thesis uses CRF++ as a tool which is aidded by a post-processing technique. Experiments show that the combined approach achieves a better result.The current knowledge-document search system has been implemented and deployed in a company.

Keywords/Search Tags:

knowledge document, semantic search, two-level index structure, thesaurus, thesaurus construction

PDF Full Text Request

Related items

1	Document Clustering Based On The Semantic Network Of Forestry Thesaurus
2	Research On Automatic Construction Of Natural Language Thesaurus
3	The Thesaurus-Based Construction Of Knowledge Base For TCM Fundamental Theory
4	Research On The Construction Of Emotion Thesaurus And Algorithms Of New Cyber Words Identification
5	Automatic Supervised Thesauri Construction with 'Roget's Thesaurus'
6	The Integration Of Folksonomy And Thesaurus
7	Comparison And Research On The Thesaurus Visualization Of Relationships Among Descriptors
8	Study On Construction Method Of Agricultural Domain Ontology Based On Agricultural Thesaurus And Document
9	Research On Maritime Ontology Construction Based On Thesaurus And FCA
10	Study On SKOS-based Transformation From Thesaurus To Ontology In Semantic Web