Font Size: a A A

The Research Of Speech Time Scale Modification And Pitch Shifting Technology

Posted on:2016-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y S LeiFull Text:PDF
GTID:2308330464974238Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Speech signal processing technology is closely related to the field of computer science technology, pattern recognition and artificial intelligence. It is an integration of linguistic research and signal processing technology. The research of speech signal processing technology will promote the development of related technologies. For example, with the continuously development of human-computer interaction the speech channel based interaction environment will become a hot spot and the mainstream. Speech interaction environment will greatly improve the naturalness and efficiency of human-computer interaction. Therefore, it has important theoretical and practical significance for further study of speech signal processing technology.Speech signal processing technology can be divided into three categories, they are, speech synthesis, speech coding and speech recognition. In this thesis, the speech change parts included in the speech synthesis technology: speech time scale modification and speech pitch shifting technology are mainly been studied. Speech time scale modification is the technology which changes the length of the speech and maintains the fundamental characteristics like speech pitch and timbre of the speaker unchanged at the same time; while speech pitch shifting technology is the processing which changes the pitch of the speaker under the situation that the length of the speech unchanged. In practical application, speech time scale modification and speech pitch shifting technology can be widely used in many fields like speech compression, communication, foreign language teaching, video post production, text to speech system and etcetera.The thesis firstly expounds the background and significance of the research content briefly and introduces the main processing methods and research progresses of speech time scale modification and pitch shifting technology in both domestic and foreign academic circles. Secondly, it studies and explains the physical mechanism of speech production. On the basis of the physical mechanism, it analyzes the classical digital model of speech in detail.On this basis, for speech time scale modification technology, the thesis elaborates the principles and implementation methods of OLA, SOLA, WSOLA algorithms and linear prediction method based on model principle and conducts the comparative simulation experiments. Aimed at the problems that the WSOLA algorithm does not distinguish the perceptual sensitive parts from the speech information and scales all the speech segments with a same scheme so that causes the decline of effect of the scaled speech when the sampling rate is low or the entire scaling proportion is large, the thesis analyzes the basic principle of human auditory prediction and puts forward an improved WSOLA time scale modification algorithm based on the human auditory prediction principle. The improved algorithm retains the perceptual sensitive transition segments of human ear and therefore improves the scaling quality of the scaled speech. In further, it proposes a dynamic time compensate algorithm to correct the scaling proportion deviation brought by the above-mentioned improved algorithm. It effectively ensures the scaling accuracy and improves the perception of the scaled speech.For speech pitch shifting technology, the thesis analyzes and derives the time domain resampling combined time scale modification method, the linear prediction algorithm and gives the comparative simulation experiments. It employs the improved WSOLA time scale modification algorithm based on the human auditory prediction principle combined linear resampling method and obtains pitch shifted speech with satisfactory effect. On the other hand, taking the two aspects of accurately simulating the glottis excitation and precisely extracting the track response into account, the thesis proposes a processing method based on Cepstrum domain homomorphic filtering. It extracts a human vocal speech segment through Cepstrum method, periodically expands the speech segment then resamples the period expanded speech segment according to the scaling demand therefore acquires the pitch changed glottis excitation. It uses homomorphic reconvolution method to extract the track response and resamples track response with a linear proportion to get the formant frequency changed track response. It effectively realizes the pitch shifting operation with the speech length remaining unchanged and improves the naturalness of synthesized speech.Finally, the thesis constructs a GUI speech signal processing platform based on Matlab. It integrates the all research algorithms of speech time scale modification and speech pitch shifting technology together and makes the processing results exhibit directly. The platform system can accomplish time changing with pitch unchanged, pitch changing with time unchanged and both time and pitch changing transform of the local or live record speech according to the demand of the user. It can also display time domain waveform and the frequency domain spectrum of the scaled speech in real time. Through the additional shortcut function keys, the users can quickly actualize the transform of male speech becoming female speech or female speech becoming male speech. The users can save the scaled speeches through the saving function.
Keywords/Search Tags:Time Scale Modification, WSOLA Algorithm, Speech pitch shifting, Homomorphic Processing, GUI Processing Platform
PDF Full Text Request
Related items