| Speech signals, the acoustical manifestation of language, have always been explored by scientists and technologists. The studies of speech in aerodynamics and acoustics tell us that speech production is nonlinear. We have known about some basic nonlinear features of speech, say, the fractal dimension and the Lyapunov exponent. Those fea-tures are estimated under the assumption that speech signals are stationary and the data are sufficient. However, the speech signals are non-stationary. In addition, it is often impossible to obtain sufficient samples for the statistical estimation of their nonlinear features. Thus the results obtained by using the general nonlinear processing methods can not depict the nonlinear features of speech accurately, especially, the results for the transient part of speech, such as the beginning and ending part of a voiced sound. In order to in-depth understand the underlying physics of speech production, more and more people tend to make study on nonlinear microstructure of speech. These studies are also required by the better application of nonlinear processing of speech in the case of the technologies of electronic, computer and signal processing are highly developed.As the foundation of the nonlinear analysis of speech, a brief introduction to three aspects of physical modeling of speech production is given first, i.e., the vocal fold os-cillation, the turbulent sound source and interaction phenomena. Then a review on the known nonlinear features of speech is made. As to the models of speech, the linear and nonlinear models are introduced. Local Linear Prediction (LLP) model and the second order Volterra series model, which originate from nonlinear regressive, are derived out by making approximation to the nonlinear oscillator model, so that these two models model can also be understood in terms of the nonlinear dynamics. Relations between the LP model and LLP model are also discussed.Every phoneme consists of non-stationary beginning and ending parts, whose am-plitude varies in short time. It is difficult to study those parts by using general nonlinear processing methods except the Recurrence Plot (RP). The RP is developed for the anal-ysis of the short non-stationary signals. The transient part of voiced sounds is analyzed to find the similarity in its waveform or trajectory. Experiments on the vowels and nasals show that there exists similarity in the trajectory. A recursive method is devel-oped to reduce the computations cost in the recurrence plot computation.Aiming at the reduction of the computational complexity, a Partially Adaptive Multi-Step Local Linear Prediction (paLLP) is developed for the nonlinear prediction of voiced sound based on the evolution of the state points in the phase space. By comparing it with other LLP-based prediction methods, it is testified that it is superior to the other LLP-based methods in the prediction precision except the original LLP. A practical scheme is developed for the paLLP to demonstrate its feasibility. Based on this scheme the computational complexity of the paLLP is analyzed, and the result shows that it is more fast than LLP, but is similar to the two other LLP-based ones. The comparison with the LP shows that the paLLP is more accurate, but the periodicity in the paLLP residual is lower than that in the LP residual, while its computational complexity is similar to the LP. Inspired by the LD-CELP, an Analysis-by-Synthesizing codec of speech is designed, and the working principle is introduced.As a non-stationary nonlinear analyzing method, the Empirical Mode Decompo-sition (EMD) method has been applied to speech signals, so that the analysis can be made in the subspace of IMF of speech. But in many applications, only part of IMFs is intuitively selected for the further processing. In order to chose proper IMFs for the further processing, it is necessary to analyze the nonlinear feature of all the IMFs. In the analyses, the windowed averaging EMD (WA-EMD) is used to make the EMD of the speech for its stable sifting and lower computational complexity. By preset-ting a group of desired frequencies, the speech can be decomposed into 8 IMFs whose frequency bands corresponding to the desired frequencies. By estimating the Hurst exponent of the power spectra, the certainty of IMF can be measured. By making the higher-order singular spectra of the IMFs, their embedding dimension can be estimat-ed. By computing the three-order spectrum and normalized three-order spectrum of IMFs, the nonlinearities of IMFs are detected. The results show that the IMF1 can be regarded as noise, the IMF2-IMF5 is nonlinear and the others are approximately is linear. Among them, the IMF5 or IMF6, which includes the most important structure information of original speech, is approximately linear.All these results made in our work enable us to further understand the nonlinearity of the speech and to improve the performance of the nonlinear processing of speech. |