| The traditional discrete signal processing(DSP)technology mainly processes low-dimensional signals and applies vector representation for the low-dimensional signals.However,the vector representation lacks comprehension of some high-dimensional signals and their hidden structure.In other words,the relationships between signals(such as collection location,mutual correlation degree,data distribution characteristics,etc.)are not captured by the vector representation.To handle these problems,graph signal processing(GSP)is born on demand.Unlike the traditional DSP technology,GSP offers the ability to deal with signals that use the graph operator edges and weights.Moreover,the dissertation proves that the traditional discrete Fourier transform(DFT)is a special case of graph Fourier transform(GFT)based on cycle graph signals’ graph adjacency matrix.Therefore,GSP extends DSP theories and technologies into the graph domain,leading to providing new tools for signal processing.Speech signals are non-stationary and nonlinear signals without prior graph topologies.From the perspective of the intrinsic graph of speech signals,we take the advantages that different adjacency matrixes constructed by different edges and weights would define different graph Fourie basis,GFT maps speech signals into different graph frequency domains,which obtains speech signals’ different valuable characters.In the background,the main task of this dissertation is to design and define new graph topologies and graph adjacency matrices for speech signals supported on graphs and then obtain valuable properties,which investigate the speech graph enhancement that outperforms classical denoising algorithms in DSP.The main contributions of this dissertation are as follows:First,with the help of GSP,this dissertation proposes a single directed graph by speech graph shift operator to map speech signals into the graph domain successfully,which provides a way for constructing speech graph signals.Based on the GFT in GSP,the graph adjacency matrix of the single-directed graph topology is operated by the singular value decomposition(SVD)to construct a novel graph Fourier basis by using its singular eigenvectors for speech graph signals(SGSs).The speech signals and noises can be mapped into the graph frequency domain based on the directed graph topology,which can investigate the graph spectrum of speech signals and noises.On this basis,we propose an improved graph Wiener filtering method based on the minimum mean square error(MMSE)criterion to suppress noise interference in noisy speeches.Numerical experiments showed that in terms of SNR and PESQ,the proposed graph Wiener filtering outperforms the baseline,and demonstrates the effects of the designed single directed graph representation of speech signals.Then,to solve the problem that the directed cyclic graph of time series and the proposed single directed graph topology of speech signals cannot capture the intrinsic relationships of inter-frames,this dissertation proposes to use the graph learning method to investigate a time-vertex joint directed graph topology of inter-frames and the graph structure for speech samples within a frame by applying the graph shift operation.On this basis,we define a joint graph Fourier basis by the singular eigenvectors of the time-vertex joint directed graph’s joint graph adjacency matrix and propose a novel vertex-frequency graph Wiener filtering method,leading to improving the speech enhancement performance.Numerical experiments showed that the vertex-frequency graph Wiener filtering method outperforms the baseline in terms of SNR and PESQ.Next,the proposed single-directed graph and time-vertex joint graph fail to obtain orthometric graph Fourier basis by using them graph adjacency matrices;the graph Weiner filtering-based the speech graph shift operator and the vertex-frequency graph Wiener filtering are difficult to enhance the noisy speech signals in case of the high noise environment.To deal with these problems,the dissertation uses the k-graph learning method to construct weighted,connected,undirected multiple graphs.To benefit from the learned multiple-graph property and enhance interpretability,we study the spectral property of speech samples in the graph frequency domain.We then propose a graph spectral magnitude estimator based on graph minimum mean-square error(MMSE)for speech signals residing on undirected multiple graphs.The numerical experiments demonstrated that in terms of SNR,PESQ,LLR,and STOI,the proposed method outperforms the graph Wiener methods in GSP and the classical speech methods in DSP.Finally,the phase of speech signals by using STFT is not helpful denoise in classical speech enhancement(etc.basic Wiener filtering,basic spectral subtraction).Recent research finds that the phase takes an important role in speech enhancement in some noisy environment cases.Notes that the end-to-end time-domain speech separation networks with masking strategy have recently presented remarkable success against their frequency-domain counterparts.Based on the timedomain speech separation,the dissertation proposes the graph signal enhancement based on the time-graph domain speech separation method to suppress speech noise and then extract the speech signals,which provides a way to enhance speech signals in the speech-noise environment.It is worth noting that speech separation networks employ a 1-D convolutional layer as a speech encoder to encode the waveform to latent feature representations with a sliding window,which is used to estimate a mask for each speaker in the separation stage.When the sliding window size is large,it is difficult to capture the waveform details in each window,thus hard to separate and decode into high-fidelity signals.Reducing the window size alleviates the problem but at the expense of high computation complexity.In this dissertation,we propose to build a time-graph representation for each latent feature by using a speech graph shift operator.We use the time-graph representation to encode the structural details with a graph convolutional network encoder,which does not need to reduce the window size.The encoded graph feature representation complements the original latent feature representation,which improves the separation and reconstruction performances.Moreover,the dissertation proposes a graph signal enhancement method based on the time-graph domain single speech separation to enhance the phase and magnitude spectra of speech signals,which successfully suppress noises.Numerical experiments show that in terms of SI-SNRi and SDRi,the proposed time-graph domain speech separation method outperforms the baseline on clean and noisy datasets,which benefits the separation and reconstruction performances.Moreover,the experiments demonstrated that the proposed graph signal enhancement based on time-graph domains speech separation can suppress the background noise and extracts valuable signals of target speech successfully. |