| With the rapid development of Internet technology,many organizations and individuals choose to store massive multimedia data such as audio and images in the cloud in order to save the storage space of local computers.However,while cloud storage brings great convenience to people,it also causes frequent data leakage and other problems.In order to meet the demands of user privacy protection,private speech data will be encrypted before outsourcing,which will bring inevitably some difficulties to the rapid retrieval of data.Therefore,it is of great practical significance to conduct efficient,secure and accurate retrieval of massive speech data in the cloud.In order to ensure the privacy and security of cloud speech data and hash index,and realize efficient secure speech retrieval while ensuring retrieval accuracy,this thesis uses technologies such as deep learning,fully homomorphic encryption(FHE),and speech feature extraction to research key technologies such as speech CKKS FHE,spectrogram image feature extraction,triplet deep hash construction,and secure speech retrieval and index.The main research work is as follows:1.In order to solve the problems of low security of cloud speech data,low efficiency of speech fully homomorphic encryption and decryption and a large amount of ciphertext expansion,a speech FHE method based on the CKKS algorithm was proposed.Firstly,the speech data was converted from analogue to digital.Secondly,the digital speech data in complex space was segmented to generate a series of two-dimensional arrays.Then batch processing technology was used to realize cyclic encryption of matrix,and finally mode switching technology and re-linearization technology were used to generate ciphertext speech data with a small amount of expansion.The experimental results show that the encryption and decryption time of this method is about 16.7s and 5.5s respectively,the amount of ciphertext expansion is small,and it has high security,which can be applied to the secure identification and retrieval of cloud speech data.2.Aiming at the problems that the existing deep hashing methods of content-based speech retrieval did not make full use of supervised information and the generated hash codes were suboptimal,resulting in low retrieval precision and low retrieval efficiency,a triplet deep hashing method for speech retrieval was proposed.This method used spectrogram image features as the input of the attentional mechanism-residual network model(ARN)in a triplet manner to extract effectively the deep information of speech features,and combined the new triplet cross-entropy loss function to generate efficient and compact hash codes.The experimental results show that the efficient and compact binary hash codes generated by this method makes the recall rate,precision rate and F1 score of speech retrieval reach 98.5%.Compared with single tag retrieval method,the time is saved by 50%,which can significantly improve retrieval efficiency and accuracy while reducing the amount of computation.3.In order to protect the privacy of speech data and deep binary hash codes,and realize the privacy-preserving similarity calculation,a secure speech retrieval method using deep hashing and CKKS was proposed.Firstly,the deep hashing method based on the triple convolutional neural network(Tri-CNN)was used to extract useful information of spectrogram image features and generate efficient and compact deep binary hash codes;Then,a speech CKKS FHE method was designed to encrypt the original speech data and deep binary hash codes and upload them to the cloud together.When retrieving,the deep binary hash codes of the querying speech was extracted and encrypted before being sent to the cloud server,and the security similarity was calculated with the index sequence in the secure index table.The experimental results show that the mean average precision m AP of the proposed method in the TIMIT and THCHS-30 data sets is more than 93%,with a loss of about 2%compared with the plaintext domain,but with higher security. |