Font Size: a A A

Research On The Model,Method And Application Of Receptive Field Learning

Posted on:2018-01-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q ZhaoFull Text:PDF
GTID:1318330512483152Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The performance of image detection and recognition depends largely on the quality of features. Good features should be able to discard irrelevant infomation and abstract task-specific elements. The ability of representation is limited by the traditional image descriptors. Image detection and recognition hardly has a greater breakthrough based on these descriptors. In recent years, with the development of feature learning, features can be learned from image data. That makes image detection and recognition algorithm achieved greater improvement. In the feature learning algorithm,features generated by pooling are more compact and representative. Irrelevant details are discarded from feature representation. In addition, pooling can produce some complex characteristics of features. Hence, some researchers recognize the function of pooling in feature learning as being similar to that of mammalian complex visual cells. In neuroscience, the response of complex visual cells to the input signal is spatial localized, oriented and bandpass. The properties refer to receptive fields of complex visual cells.The main work of this paper focuses on feature learning for image detection and recognition by modifying pooling model. Modified pooling model can improve representative ability of features, which is named receptive field learning by some researchers. The researches are based on bag-of-features (BoF) model and convolutional neural networks (CNN) model. The contributions include four aspects as follows:1) Local spatial statistics of feature maps are used to improve representation of BoF model. Firstly, based on the analysis of the existing schemes, a receptive field scheme with low spatial similarity is proposed, which can reduce the redundancy in feature representation casused by similarity of spatial regions. Secondly, pooling regions are distinguished by their number of features. The pooling regions containing more features are calculated to multiple pooled features by using multiple aggregation operations. For the pooled features, the Fisher kernel method is also used to utilize the distribution information of the feature space. These features containing low redundancy,rich local statistics and feature space distribution, can improve the ability of representation.2) The redundant visual words and redundant pooled features of BoF model are released by learning in the classification task. Two solutions are proposed. We have updated the scoring function mentioned in Jia et al's receptive field learning algorithm,which makes features in selected feature map higher scores during learning. Thereby the growth of feature maps is limitied and after learning feature maps and pooling regions contains little redundancy. The second scheme is on the base of the first scheme. The learning is divided into two stages. It firstly determines the importance of visual words,and then select receptive fields on feature maps generated by preserved visual words.3) Learning face saliency is proposed by using global average pooling. Based on the structural information of saliency feature map, face detection model is estiblished,which is a part of CNN based face recognition and detection multi-task model. We use the negative samples to suppress responses of non-face regions, so that the feature responses achieve saliency of human faces. Saliency learning can suppress responses of background, and improve the adaptation to various datasets. In addition, structural information exists between the different feature maps. In order to utilize the structural information to locate human faces, this work uses the idea of part-based model to design detection scheme. The saliency learning and face detection process are integrated for joint training, so that the model of feature learning and face detection can be further optimized.4) The effectiveness of region proposals in Faster R-CNN model is investigated.First of all, using saliency learning in region-based approach, the effectiveness of region proposals is evaluated according to saliency maps, which can provide the context information and the structural information. The region proposals with high scores are preserved for detection. When objects contain parts in detection, regional proposal network (RPN) usually makes inappropriate proposals caused by ambiguity. That is because cooccurrence frequently and overlap between the parts. To avoid the ambiguity,multi-task branches are used to calculate region proposals of objects and parts seperately.Finally, the scheme of pruning parameters is put forward for the model. According to parameters, the scores of contribution of feature maps or the nodes are achieved to determine their importance to the model. Only those feature maps and nodes with a high importance can be retained.
Keywords/Search Tags:receptive field learning, pooling, feature learning, region proposal, bag-of-features (BoF), convolutional neural networks (CNN)
PDF Full Text Request
Related items