| With the rapid development of computer technology and Internet technology,speech data shows explosive growth compared with the previous.How to analyze and process a large amount of speech data and obtain useful information,as well as how to use the speech technology to achieve human-computer interaction has been widely concerned by researchers.Speech term detection technology does not need to recognize all words in speech,but only needs to detect predefined keywords from a continuous speech stream.It is widely used in audio information retrieval,audio monitoring,device wake-up,smart home,and other fields.The spoken term detection is dominated by deep learning,which requires large-scale annotated data for training,and is difficult to be applied in limited-data scenarios.In this paper,we propose an algorithm of spoken term detection based on feature space trajectory information for limited-data scenarios,which uniformly describes the statistical and temporal characteristics of keywords,and makes full use of the discriminative information among different keywords.The experimental results show that the algorithm presented in this paper has significant advantages over HMM and CRNN systems in limited-data scenarios.The main work of this paper is as follows:(1)A spoken term detection algorithm based on feature space trajectory information is proposed.Firstly,the audio feature space is obtained by clustering the feature set of unlabeled speech samples.Then,the feature space distribution and trajectory information was constructed based on audio feature space,as well as the local discriminability information of confusing objects.Finally,the spoken term detection is carried on the feature space distribution and trajectory information.(2)Verify the feasibility and effectiveness of the algorithm.The influence of some parameters in the scheme on keyword detection performance was investigated experimentally,including expression granularity of audio feature space,the number of classifiers and the size of training set.Compared with the existing CRNN,HMM,and histogram methods,the proposed algorithm has significant advantages in limited-data scenarios.When the amount of training samples for each keyword is 10,the CRNN method cannot be implemented due to the small amount of data.Compared with the HMM,the error rejection rate of the algorithm in this paper decreased by 20.5%,and the average number of false alarms per hour decreased by 8.7;compared with the histogram,the error rejection rate decreased by 4%,and the average number of false alarms per hour decreased by 5.72.(3)We study the duration and variation range of unvoiced consonants in continuous speech,analyze the unvoiced/voiced structure information of keywords.The speech segments to be detected were divided and screened from test samples according to the unvoiced/voiced structure information,so as to avoid unnecessary matching and speed up keyword detection.After screening the speech segment according to the unvoiced/voiced structure,the real-time factor decreased from 0.601 to 0.326,a relative decrease of 45.76%.A spoken term detection algorithm based on the unvoiced/voiced subspace is proposed.The audio feature space is divided into unvoiced subspace and voiced subspace,and the keywords are modeled in more detail.During the detection,the candidate results can be further distinguished by using the prior knowledge of the unvoiced/voiced of keywords,which is helpful to reduce the recognition error rate of confounded words. |