| The human visual system has high efficiency and robustness.It is important to study the brain visual analysis model through the neural activity of the brain’s visual cortex to explore the visual mechanism and promote the development of brain-like intelligence technology.Studying visual parsing models based on functional magnetic resonance imaging(f MRI)is an important approach in this field.With the application of a depth neural network(DNN)in the visual analytical calculation model,the performance of the visual analytical model is effectively improved.However,constructing a visual analytical model based on a depth network mainly draws on the model construction method in machine vision.There is a lack of research on how to effectively use the visual perception characteristics in neuroscience to construct the visual information calculation model of depth network.Therefore,it is essential to explore how to design a deep network analysis model that conforms to the visual perception characteristics of the brain from different visual perception characteristics of the human brain to improve the model performance.Focusing on the key problem of "how to build a depth network visual analysis model according to the characteristics of visual perception",this paper constructs visual encoding models for low-level visual areas and middle-higher visual areas,respectively,as well as a visual classification model,from the aspects of "visual encoding model for deep networks based on the perceptual properties of primary visual cortex(V1)","visual encoding model of depth network based on visual shape perception" and "visual classification model embedded in attention mechanism".The main work is as follows:(1)A low-level visual area encoding model based on VOne Block is proposed.The encoding model for low-level visual areas is mainly constructed using the "end-to-end" method,and more effective image representation and linear regression weights are learned directly from the middle end to the end of the voxel response.However,existing end-to-end visual encoding models fail to effectively characterize the perceptual properties of simple cells in the low-level visual cortex,and the model itself lacks interpretability.In response to this problem,this paper uses the VOne Block convolution module that simulates the primary visual cortex(V1)to construct a visual encoding model for low-level visual areas.The model includes a layer of VOne Block convolution layer simulating the primary visual cortex(V1),several layers of ordinary convolution layer,and a layer of sparse full connectivity layer.The experimental results show that the encoding accuracy of the low-level visual area encoding model based on VOne Block is higher than that of the common end-to-end model,and the results also indicate that the deep neural network simulating the primary visual cortex(V1)can improve the encoding performance of the low-level visual area.(2)A middle and high-level visual area encoding model based on Shape-Res Net is proposed.Most of the current visual encoding models for the middle and high-level areas use Image Net pre-trained DNNs and are constructed using a "two-stage" approach.However,the pre-trained DNNs have a strong texture bias,and the extracted image features cannot effectively characterize the characteristics of shape perception in brain’s middle and higher visual areas.This paper introduces a deep network that learns shape features when building a visual encoding model in response to this problem.First,the standard Res Net and Shape-Res Net are used to extract the stimulus images’ depth features and then train the linear regression model from the depth features to the voxel response.The experimental results show that the encoding model based on Shape-Res Net outperforms the encoding model based on standard Res Net,indicating that the deep network that learns shape features can improve the encoding accuracy of middle and high-level visual areas.(3)A visual classification model based on LSTM-SENet is proposed.The existing visual classification model neglects the difference between voxels when extracting features,and then reduces the classification efficiency and accuracy.Aiming at this problem,this paper embeds the channel attention mechanism SENet into the LSTM network to construct the category decoding model based on LSTM-SENet so that the model learns the weights of different voxels,realizes the information interaction of each visual area at the same time,and pays attention to important characteristics to predict the natural image category.By comparing with multiple models,the classification accuracy of the visual classification model based on LSTM-SENet is significantly improved in the three-level categories(5-classification,10-classification,23-classification).The results show that the depth network embedded with the attention mechanism can improve the classification accuracy of a visual classification model. |