Font Size: a A A

Based On The Related Entity Retrieval Model Of Information Protection

Posted on:2013-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:D WangFull Text:PDF
GTID:2248330395950198Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of technologies such as natural language processing and data mining, and especially the widely usage of search engine, scattered information can be gathered together easily with high efficiency, which makes even the ordinary users can also easily obtain their expected information from the Internet. However, the information retrieval technology is a double-edged sword. For users, hiding their private information has become increasingly difficult while getting external knowledge more conveniently. Attackers can make related entities presumption for the original safe information posted on the forums, blogs and social networks by users with a search engine, which may cause the leak of user information. Traditional information protection concentrates mostly in the fields of database and information security. The former mainly focuses on information protection on structured data, the latter mainly focuses on informnation security in the transmission process.This paper comes from Hi-Tech Research and Development Program or China, in this paper we study the association of sensitive entities on large-scale unstructured data and promotes a Web based sensitive information protection framework, related study background is mainly in the fields of information retrieval and natural language processing. Based on the usage of search engine and considering the features of Internet data, employing techniques of text minging and information retrieval, we propose a multi-perspective association model to dectect the potential leakage of user information and protect them through related entity retrieval.The work of this paper is mainly in the following parts:●Giving a brief survey of the current research about information protection, traditional methods in database and information security field and technologies used in large-scale unstructured data protection.Proposing the information protection framework based on the related entity search algorithm, building a multi-perspective association model, and through the deep mining of authority pages to improve the retrieval results of the associated model.Designing and realizing the information protection system based on Internet massive data. Related entities search module of the system is tested in TREC2010entity search task dataset, results showing that compared with BM25, traditional Bayesian model and other retrieval methods, our method has a better performance in several evaluation indicators, which proves the accuracy and applicability of the multi-perspective association model as well as the effectiveness of the proposed method.
Keywords/Search Tags:information protection, entity search, text mining, association rules
PDF Full Text Request
Related items