Font Size: a A A

Research On The Technologies Of Information Mining And Privacy Preserving In Distributed Environment

Posted on:2013-07-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z JiaFull Text:PDF
GTID:1228330374999499Subject:Information security
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and storage technologies, the number of web users and web applications has expanded rapidly, and data storage is showing explosive growth. In facing of the huge amounts of data, it also needs rapid management of computing, and improving the response speed. Traditional single-server storage model has not been able to meet the requirements of performance and reliability for the large amount of data storage and computing. Distributed storage and parallel computing can solve the problems above well, not only to optimize data storage performance, but also to solve the scalability issues by data continuous growth. With the era of Cloud Computing coming, the Hadoop distributed computing platform gets a very wide range of applications because of its huge storage and computing power, simple calculation mode, and convenient services. More and more researchers have extended Cloud Computing to the distributed environment, such as data mining, data analysis and other areas involving massive data calculations, have all begun to work on the study of algorithms in distributed environment.However, due to the open platform and shared mode in Cloud Computing environment, it also faces a lot of security and privacy challenges. Especially the concerns for privacy issues growing rapidly, many researchers have committed to privacy protection. For example, there are often the situations that multi parties need to mining on their joint databases that are distributed storage in different area. Since the data are usually recorded the core technology and privacy information of the company, how to mining on the joint databases without disclosuring the privacy is a real problem to be solved. As for the distributed outsourcing database services, the user permissions are usually dispersed. In order to protect the data security, a very natural way is to use access control strategy. How to protect the privacy of user permissions when using access control strategy in outsourcing databases is a popular research field. Therefore, based the analysis above, we determined the research focus of this paper, that is research on the technologies of information mining and privacy preserving in distributed environment.The main contributions of this paper are as follows:(1) Data mining algorithm in distributed environment. Web log mining, is mainly analyze the log information to obtain the association web pages, user categories, hotspot clustering and access sequences and so on, to improve the user experience. But, with the development of internet, there are hundreds of millions of web logs each day. The analysis and mining on web logs is urgent need extend to the distributed environment. This paper is focusing on the web log sequential mining task, based on PrefixSpan sequential pattern mining algorithm, proposed a rapid, efficient sequential mining algorithm based on sliding window model in horizontal distributed environment. And then extends the distributed algorithm to Hadoop platform. Experiments results show that the sliding window model can reduce the size of redundant frequent sequences while obtaining the complete frequent sequences.(2) Privacy preserving data mining in distributed environment. With the increasingly cooperation between multi commercial companies, mining on the joint databases that between different parties are more and more common. However, data ovner sometimes do not want to publish the private data, while just want to obtain the mining results. Based on Secure Multi-party Computing, privacy preserving data mining can solve the problem well. This paper focused on the distributed databases, using SMC basic protocols, studied the privacy preserving sequential pattern mining algorithm in horizontal distributed databases and the privacy preserving association mining algorithm in vertical distributed databases. To solve the horizontal privacy preserving sequential pattern mining problem, we proposed a novel secure matrix sum protocol based on ElGamal threshold encryption and homomorphic encryption; while to solve the vertical privacy preserving association mining problem, we proposed a novel secure scalar product protocol. Through the application of these basic protocols, we solved the above problems well. Theoretical analysis shows that the algorithms can hide data information without the Trust Third Party. And in the end, we give the modular method to design the privacy preserving data mining algorithm based on SMC which has the advantages of flexible and easy.(3) Outsourced database service is a popular service mode in Cloud Computing. With the concerns on data security increases, the data owner want a safe manner to entrust the data storage in proxy servers, and want to manage the users visit permissions by access control strategy. However, with the consideration of users’ privacy, the users’ authorization information also needs to be protected to avoid disclosure. Therefore, this paper proposed a privacy preserving access control protocol for Database as a Service. First, using secret sharing to distributed storage data to ensure data security; second, the ElGamal encryption can encrypted the access control strategy to protect the security of authorization; third, we designed a method to joint the access control and user queries based on the ElGamal homomorphic features, which users can get the query results in the security manner. The theoretical analysis shows that without the Trust Third Party, the model can hide access control information well.
Keywords/Search Tags:distributed environment, web log sequential patternmining, privacy preserving data mining, secure multi-party computing, database as a service, privacy preserving access control
PDF Full Text Request
Related items