Font Size: a A A

Research On Key Technologies Of Documentary Sensitive Information Retrieval Based On Ontology

Posted on:2014-08-31Degree:MasterType:Thesis
Country:ChinaCandidate:H C ChenFull Text:PDF
GTID:2268330401976853Subject:Military communications science
Abstract/Summary:PDF Full Text Request
Along with the development of e-government, information exchange and knowledge sharing among individual, company and government seem to be more and more important, accompanied with the serious threat of sensitive information leakage. In order to avoid the influence on individual, company and government for the sensitive information leakage, it is necessary to detect and mining the sensitive information of documents stored in government affairs terminal to prevent the occurrence of secret leakage. However, how to improve the recall and precision of the information retrieval systems is a serious problem to be solved.In order to solve the problems above, this thesis compare and analysis the existing documentary sensitive information retrieval methods. Refering to the general process of documentary sensitive information retrieval, we also do research in the field of query expansion, document index and retrieval model. The main works and innovations of this thesis are as follows:1.In order to solve the problem that query expression is insufficient to express the users’ intention, this thesis proposes a new method of query expansion based on the user intention subtree within domain ontology, which strengthens the semantics of query expression. We also propose the method that builds a user’s intention subtree within domain ontology, which can infer the user’s intention efficiently. In order to decide which node of the user’s intention subtree should be added to raw query, we also develop a method to count the similarity among the concepts of the tree. The method synthesizes three factors of semantic distance, semantic sharing and concept’s weight. The experiment results show that the proposed method has higher performance.2. Aim at the problem of mis-retrieval and lack-retrieval because of lacking semantics in index, this thesis prposes a documentary sensitive information index method based on ontology inference rules, which improves the recall and precision of system, and adds the ability of inferring to the system. After knowing the characteristics of sensitive information and building a sensitive knowledge base, we propose an instance’ sensitivity annotation method based on inference rules. We also modify the method of counting the weight of instance. Among all we have done above, we realize the acquistition of latent knowledge and have a semantic index. In order to improve the efficiency of retrieval, we also present a hierarchical indexing scheme which is quickly and tolerably. The experiment results prove that the proposed method can retrieve sensitive information more accurately, more completely and more quickly.3. The problems that the existing retrieval models can not handle the uncertainty in retrieval and ignore the relationships between keywords brings the low recall and precision, so this thesis proposes a documentary sensitive information retrieval model based on multi-evidence combination. The proposed method improves the ability to handle the uncertainty and relationships between keywords. We also propose a method of getting initial evidence probability belief and present the model’s mathematic representation with the dempster-shafer evidence theory, which focuses the uncertainty. We also propose an evidence combination rule based on similar coefficient between keywords, which considers the relationships between keywords. The experiments prove that our method can improve the recall and precision compared with the existing retrieval model.4. This thesis designs and realizes the documentary sensitive information retrieval system(DSIRSO)based on Lucene and e-government scientific domain ontology. First we present the system architecture, and introduce the design idea of some key models, such as ontology building, document preprocess and semantic index and so on. The experiments verify the proposed technologies and model, and the results show that the systems which use both the two proposed methods and model has the most optimal performance.
Keywords/Search Tags:Sensitive Information, Ontology, Query expansion, Semantic Index, Dempster-Shafer Evidence Theory
PDF Full Text Request
Related items