| Head pose estimation(determining the direction of human head relative to the camera view)is a long-term research topic in related fields.Head pose estimation has a wide range of applications in line of sight estimation,attention modeling,3D model and video fitting,face location and so on.In the past two decades,the research of head posture has promoted the development of 3D reconstruction vision technology and multimedia content operation,and played an important role in the interaction with users.It is widely used in attention detection model,driver driving monitoring,classroom intelligent management and face recognition.The main problem of this paper is the study of head pose estimation,that is,given a face image or video stream containing a certain head pose,we first process the image,and then train the neural network to give the head pose information in the image,including pitch angle,rotation around the X axis;yaw angle,rotation around the Y axis;roll angle rotates around the Z axis.The problem can be divided into two aspects,the human image processing stage and the neural network stage.For human image processing:through a series of matrix operations for the image,more feature data information in the face image is extracted.Then,the information of the neural network is extracted and sent to the three stages of the neural network to predict the pose.In this paper,three new head pose estimation methods based on image processing and neural network are proposed,and the architecture of neural network is analyzed and designed again.In these three methods,we use wavelet transform,optical flow and neural network.In the first method,we combine wavelet transform image with neural network.After wavelet processing,the image can retain the main information in the image,remove the noise and redundant information in the image,and reduce the amount of calculation.We mosaic it with RGB image,and add the information after wavelet transform to the input as an additional channel to help neural network better estimate and converge.In the second method,we design a deep level and from coarse to fine neural network architecture.Firstly,we analyze the data distribution in the head posture database.According to its characteristics,we design a new effective network architecture.The architecture has a deep structure in both vertical and horizontal directions.It also includes top-down and horizontal connections for effective feature mapping,which can transform the regression problem into a classification problem.Firstly,the coarse classification of the image is carried out,and then the coarse classified head posture image is sent to the subsequent fine-grained network model for more accurate prediction.The framework helps to alleviate the influence of biased sample distribution,and combined with segmented mapping to form a better global fitting.In the third method,we further expand to video stream based on the study of single frame image.We study the head pose estimation method in video frames,in which the optical flow method is used to extract the relationship between video frames,the optical flow information and RGB image information are spliced,and the neural network training can get better results.Although we only discuss the head pose estimation method,we believe that the method proposed in this paper can also be extended to other regression and classification problems. |