Font Size: a A A

The Method Of Face Portrait Based On Speech

Posted on:2022-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y WangFull Text:PDF
GTID:2518306752465574Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
Audio and image are the two most commonly used signal transmission modes for human beings.With the development of science and technology,a single mode can meet the requirements of application no longer,and cross-modal research makes speech-face portrait possible.Due to the different expressions between cross-modal features,the technology still has great research potential.This paper uses machine learning technology combined with the basic theory of statistics and biology to research on cross-modal,excavates the relationship between speech and face,and matches the corresponding speaker's face image according to any given speaker's speech segments.Meanwhile a multi-angle face generation model is built based on speech to realize the speech portrait of the speaker.The main works and specific contributions are as follows:Firstly,a speech feature encode module based on PSO-CNN network is proposed.In order to solve the problem that the traditional MFCC cannot filter non-acoustic features accurately,this method uses the convolution neural network that improved by particle swarm optimization algorithm to establish the speech feature coding module,which can optimize the training speed of the network while pooling time series features.This module extracts and encodes the speech features,and finally realizes the output of high-dimensional speech features.Secondly,a face feature encode module based on residual network is proposed.The function of face encoder is to extract feature vectors from face images and optimize feature representation through iterative training.The method of constructing face feature encoder based on Res Net can effectively extract face features,strengthen the features information sharing,and generate better feature representation through iterative training process.Then the extracted face features are visualized by deconvolution network to evaluate the quality of it.Thirdly,a feature matching module based on residual structure is improved to realize speech-face portrait.Cross-modal feature matching connects speech and face,splices and integrates the two encoded features.Training network to learn the similarity of splicing features,in order to match to the correct one by speech features in multiple face features successfully.Finally,the multi-angle image generation from single-view face image is realized by periodic implicit generation adversarial network.Because the expression of single-view face image is not comprehensive,multi-angle face video and image can be obtained by the network on the basis of speech portrait.In this paper,we try to model the two-dimensional image through the neural radiance field,and generate the multi-angle human face by GAN.
Keywords/Search Tags:Speech portrait, Feature extraction, Cross-modal mapping, Multi-angle face generation
PDF Full Text Request
Related items