The Method Of Face Portrait Based On Speech

Posted on:2022-11-28

Degree:Master

Type:Thesis

Country:China

Candidate:Y Y Wang

Full Text:PDF

GTID:2518306752465574

Subject:Computer Software and Application of Computer

Abstract/Summary:

PDF Full Text Request

Audio and image are the two most commonly used signal transmission modes for human beings.With the development of science and technology,a single mode can meet the requirements of application no longer,and cross-modal research makes speech-face portrait possible.Due to the different expressions between cross-modal features,the technology still has great research potential.This paper uses machine learning technology combined with the basic theory of statistics and biology to research on cross-modal,excavates the relationship between speech and face,and matches the corresponding speaker's face image according to any given speaker's speech segments.Meanwhile a multi-angle face generation model is built based on speech to realize the speech portrait of the speaker.The main works and specific contributions are as follows:Firstly,a speech feature encode module based on PSO-CNN network is proposed.In order to solve the problem that the traditional MFCC cannot filter non-acoustic features accurately,this method uses the convolution neural network that improved by particle swarm optimization algorithm to establish the speech feature coding module,which can optimize the training speed of the network while pooling time series features.This module extracts and encodes the speech features,and finally realizes the output of high-dimensional speech features.Secondly,a face feature encode module based on residual network is proposed.The function of face encoder is to extract feature vectors from face images and optimize feature representation through iterative training.The method of constructing face feature encoder based on Res Net can effectively extract face features,strengthen the features information sharing,and generate better feature representation through iterative training process.Then the extracted face features are visualized by deconvolution network to evaluate the quality of it.Thirdly,a feature matching module based on residual structure is improved to realize speech-face portrait.Cross-modal feature matching connects speech and face,splices and integrates the two encoded features.Training network to learn the similarity of splicing features,in order to match to the correct one by speech features in multiple face features successfully.Finally,the multi-angle image generation from single-view face image is realized by periodic implicit generation adversarial network.Because the expression of single-view face image is not comprehensive,multi-angle face video and image can be obtained by the network on the basis of speech portrait.In this paper,we try to model the two-dimensional image through the neural radiance field,and generate the multi-angle human face by GAN.

Keywords/Search Tags:

Speech portrait, Feature extraction, Cross-modal mapping, Multi-angle face generation

PDF Full Text Request

Related items

1	A Research On Generating Portrait From Speaker Voice Based On Deep Learning
2	Research On Deep Cross-Modal Face Recognition
3	Multi-Modal User Portrait Analysis System
4	The Research On Multi-modal Semantic Subspace Mapping Based On Content Features
5	Research On Cross-modal Retrieval Of Speech And Image Based On Deep Neural Network
6	Research On Dual-modal Anti-noise Feature Extraction Of Fuzzy Speech
7	Research On Cross-modal Feature Extraction And Fast Retrieval Algorithm For Geo-images
8	Speech Emotional Recognition Research Based On Features Extraction And Multi-modal Combination
9	Research On Multi-angle Face Recognition Of Video Sequences Based On Face Synthesis
10	Simultaneous Localization And Mapping Based On Seamless Fusion Of Heterogeneous Multi-Modal Features