Font Size: a A A

The Research Of Sensitive Information Detection And Retrieval Algorithm Over Encrypted Speech For Chinese

Posted on:2018-12-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:S F HeFull Text:PDF
GTID:1318330542974510Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The detection of sensitive information for instant voice is one of the applications of speech information retrieval.In query by example spoken term detection,detection speed and precision are main factors that restrict its development.These key issues should be further improved to boost the detection speed and precision significantly.Compared with English,Chinese Pinyin has its own particularities,and there is less work focused on Mandarin speech spoken detection based on query by example,how to improve the performance of sensitive information detection for Mandarin instant voice is the key issue needing urgent study.On the other hand,the instant voice generated in real-time voice communication is transferred in untrusted channel and stored in the cloud server of semi trusted state,protecting the security of instant voice and realizing the retrieval over encrypted speech data have also become the key issue needing urgent study in the field of encrypted speech signal processing.The extraction of speech features is the basic procedure of speech signal processing,and the advantage and disadvantage of speech features directly affect the performance of speech information retrieval system.This dissertation studies better speech features and Chinese syllable segmentation algorithm,and on the basis of these research,Chinese query by example spoken term detection integrating the upper and lower bounds estimation,dual key speech encryption based on undetermined blind source separation and speech retrieval over encrypted speech data based on syllable-level perceptual hashing are proposed.Specifically,the contribution of this paper is mainly reflected in five aspects.Firstly,in the aspect of speech characteristics,through the analysis of multi fractal features of speech signal,the improved multi fractal detrended fluctuation analysis is put forward;moreover,the posterior probability based on initials and finals segment models is introduced.Secondly,in Chinese syllable segmentation,considering the particularities of Chinese Pinyin,syllable segmentation is achieved by searching the first-order differential extremum points after voice extraction and syllable structure determination employing two-stage discriminant method,which reduces the number of misdetections and over-detections effectively.Thirdly,as far as sensitive information detection of massive speech data be concerned,integrating the upper and lower bounds estimation and K-nearest neighbor search,query by example spoken term detection achieves higher speed under the condition of keeping the retrieval precision;furthermore,introducing the reordering method for relevant regions based on similarity to correct initial retrieval results.In the case of choosing appropriate number for related regions in initial detection results,the retrieval speed and precision both are improved efficiently after utilizing the reordering method.Fourthly,aiming at the safety problem of instant voice,employing the intractability of undetermined blind source separation,one-time pad and the sensitivity to initial conditions of chaotic equation,a speech encryption algorithm with low computing complexity and high security is explored.Finally,in speech retrieval over encrypted speech data,on the basis of initials and finals segment models-based posterior probability,the perceptual hashing of syllables are generated and make up the perceptual hashing sequence of voice segment,without decryption and retrieval over encrypted speech directly,searching and matching the perceptual hashing sets that have the equal length and similar first part with query speech segment or keyword in system hash table to realize the speech retrieval over encrypted speech data.The syllable-level perceptual hashing derived from posterior probability based on initials and finals segment models outperforms the perceptual hashing derived from time and frequency domain features in terms of distinctiveness and robustness;besides,the retrieval strategy that only matching the perceptual hashing sequence with equal length and similar first part improves the detection speed to a great extent;finally,under various speech signal processing operation,the proposed retrieval algorithm achieves high recall and precision ratios.
Keywords/Search Tags:Speech retrieval, Speech encryption, Spoken term detection, Posterior probability features, K-nearest neighbor search, Perceptual hashing
PDF Full Text Request
Related items