| Visual perception is one of the most important ways for the human brain to receive external input,and it is also the basis for many advanced cognitive functions.More than50% of the human brain is involved in processing visual information.Exploring the information processing mechanism of the visual system is a hot research topic in the field of neuroscience,and it can also provide inspiration for related problems in computer vision.In recent years,scholars have been dedicated to modeling the mapping relationship between visual stimuli and cortical responses through neural encoding and decoding methods,which aims to investigate how the visual cortex represents visual information and whether perceived visual content can be inferred from cortical activity.However,how to construct physiologically interpretable encoding models and develop high-performance decoding models is still a challenge for current research.Based on functional magnetic resonance imaging(f MRI)technology,this dissertation collected cortical activity of subjects while performing visual tasks and developed a series of encoding and decoding methods to model the cortical representation process of visual information in both forward and reverse manner.In terms of encoding,this dissertation proposed unsupervised shallow spiking convolutional neural network(SCNN)-based and supervised deep SCNN-based encoding models.These models achieved the prediction of f MRI responses in early visual cortex and hierarchical visual areas on the ventral visual pathway,respectively.In terms of decoding,using image and video stimuli as targets and recognition and reconstruction as objectives,this dissertation proposed the recurrent neural network-based decoding model for identifying natural image sequences and the generative adversarial network-based decoding model for reconstructing dynamic videos,which aimed to infer visual information perceived by the subjects from the collected f MRI activity.The specific content is as follows:1.To ahcieve the encoding of early visual cortex,this dissertation established an unsupervised shallow SCNN to extract visual features of the input stimuli and predicted the f MRI responses of the early visual cortex based on the extracted visual features.The experimental results showed that the encoding performance of the proposed method was superior to traditional Gabor filter-based and convolutional neural network-based encoding models.This result demonstrated that the information processing mechanism of SCNNs is closer to the real human visual system.Moreover,this dissertation accomplished downstream decoding tasks of image reconstruction and image recognition based on the encoding model and the decoding performance outperformed other benchmark methods.2.To achieve the encoding of hierarchical ventral visual pathway,this dissertation constructed the encoding model based on a supervised deep SCNN to predict the f MRI reponses of visual areas of different hierarchical levels on the ventral visual pathway.The experimental results showed that compared to the shallow SCNN,the deep network structure can enhance the model’s representation ability of visual stimuli and improve encoding performance.In addition,this dissertation found that there is a certain correspondence between the hierarchical structure of the visual cortex and the SCNN.This indicates that the SCNN and the visual cortex share a similar hierarchical processing mode of visual stimuli..3.To achieve the decoding of temporal and categorical information in natural image sequences,this dissertation proposed a recurrent neural network-based dual decoding framework,which contained two decoding modules for identifying the presentation time and semantic category of the image stimuli,respectively.The framework can use the recorded f MRI signals to determine what subjects saw and when they saw it.The accuracy of the proposed model on experimental data reached a maximum of 61.6%,significantly higher than the random level(2.45%).The comparison results of decoding performance among different classifiers showed that the decoding accuracies of RNNs were significantly higher than other non-temporal models,emphasizing the importance of time series modeling in analyzing f MRI data.In addition,this dissertation found that the accuracy of early visual cortex(e VC)and high-level visual cortex(h VC)were comparable when decoding stimulus onset,while the accuracy of h VC was significantly higher than e VC when decoding stimulus category.This result indicates that both e VC and h VC are involved in visual information processing,while the semantic information of visual stimuli is mainly represented in h VC.4.To realize the reconstruction of dynamic video stimuli,this dissertation proposed a generative adversarial network-based video stimulus reconstruction model,which used the f MRI response at each time point as a conditional vector to generate multiple video frames,thereby achieving the reconstruction of fast video stimuli.The decoding model consisted of three modules: a generator and two discriminators(temporal and spatial discriminators).The generator generated video clips corresponding to the input f MRI response,while the discriminators distinguished between the generated and real video clips from spatial and temporal perspectives,and the model was optimized through adversarial learning.Experimental results showed that the decoder can reconstruct video stimuli corresponding to f MRI activity which preserved the temporal and spatial information of the original stimuli.In addition,this dissertation found that low-level visual cortex(V1/V2/V3/V4)contributed the most to the reconstruction,while some high-level visual cortex(such as TPOJ)also had a significant contribution. |