Data similarity computation is one of the most common operations of big data analysis technology.Currently,the traditional similarity calculation models mainly include vector space model,topic model,latent semantic LSA(Semantic Analysis Latent)model and the Hash model.But,the data could face to be stolen,tampered and forged,because the models above are performed in the plain-text space.If the similarity computation can be performed in the cipher-text space,the security threats of the data will be reduced.According to the needs of users for the computation of the similarity of the text,this article makes full use of the fully homomorphic encryption technique and simhash algorithm,and designs a new algorithm for computing the similarity of the cipher text.The main work of this article is as follows:1)The whole encryption algorithm and the conventional data similarity calculation method are studied and analyzed.The simhash algorithm is improved,so as to be applied to the whole encryption algorithm.And then,according to the characteristics of fully homomorphic encryption algorithm,we combined the characteristics with simhash algorithm which was improved.Finally,the computation of the data similarity in the cipher-text is realized.2)In order to reflect the availability of the method,an application example of the computation of the similarity of the text in the cloud environment is designed.The specific process is as follows: Data Owner uploads the document ID,the cipher-text of document and the cipher-text of document simhash to Cloud Server.The cloud service provider cipher-text similarity computation,then the results of Hamming Distance in the cipher-text are obtained.The data Owner gets the result of document similarity ranking by decrypting the cipher-texts of Hamming distance.In the case of the cloud did not learn the data content and its simhash plain-text,the data similarity calculation is completed.Then the data privacy is protected,the threat of data security is reduced.3)The prototype system is implemented,and the detailed process of the method and relevant experimental data have been given.The feasibility of the method is verified. |