Font Size: a A A

Design And Implementation Of Scene Understanding Technology Based On One-Shot Learning

Posted on:2020-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z DuanFull Text:PDF
GTID:2428330596975498Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Scene understanding based on vision is an essential technology for future intelligent perception,which is now widely used in many fields such as autonomous driving?intelligent manufacturing and intelligent interaction.However,most traditional autonomous driving systems obtain scene information heavily depended on multiple sensor devices,which cannot be perfect control with vision information only.With the development of Machine Learning?Computer Vision and Artificial Intelligence,the scene understanding technology has been rapidly improved.However,current scene understanding technology is a point-to-point task mapping,which needs massive data training to detect a single scene.Thus,in this paper,we propose an anthropomorphic deep learning architecture for scene understanding task,which using a brain-like methods.Meanwhile,the proposed architecture could be rapidly applied to unfamiliar scenes without any training.In this dissertation,the main research objects are vision depth estimation and vision geo-localization in scene understanding.An adaptive depth estimation network and a memory segment network are proposed based on One Shot Learning(OSL).Both of the networks take vision information as input and output the bounding box,category,depth prediction and image geo-localization information.Compared to existing methods,our model is built on metacognitive knowledge and metacognitive strategies,which are inspired by the metacognition in human brain.The presented method exhibits are satisfying performance with much better accuracy and higher transfer-capacity in the new scenes without any training.The contributions and innovations of this dissertation are summarized as follows:1.The proposed adaptive depth estimation network based on One Shot Learning can output the bounding box,category and depth prediction with parallel detection.Then a memory module and a meta-controller module are designed to achieve metacognition like human learners in the depth estimation network.By using this human-like architecture,our work could significantly reduce the test error compared with traditional methods.At the same time,thanks to the transfer-capacity of metacognition,the depth estimation architecture can perform well in new scenes without any training.2.The proposed memory segment network based on image geo-localization shows better transfer-capacity compared to existing vision geo-localization methods.Our memory segment network is inspired by biological memory retrieval mechanisms.The long short-term memory(LSTM)architecture has the mammal-like navigational and location abilities just as the mammalian hippocampus.Thus we employ LSTM in the proposed memory segment network to extract the visual feature and output location information with a match network.A Hidden Markov Model(HMM)is designed to improve the location accuracy.Because of the metacognitive transfer-capacity of the hippocampus,this model works well in unfamiliar scenes without any training.3.We train and test the proposed adaptive depth estimation network on KITTI2012 and CityScapes datasets.The findings indicate the mean absolute error is only 2m in a100-meter visual range on KITTI2012 test set.The absolute relative error is 8.8%,which is enhanced by 22.8% as traditional methods.On the CityScapes validation set,the mean absolute error is only 4.5m at maximum depths of 100 m while the error of existing state-of-art method is 7.5m.Meanwhile,we use the KITTI training set to train the model and test it under CityScapes.Even without any training,the mean location error in a 100-meter visual range is only 8.7 meters,which is even better than some other papers.4.As for the proposed memory segment network,we test the model on three different datasets.The testing accuracy reaches 96.6% within an error threshold of 40 meters on Oxford Robot Car dataset.Then on Google Street View test set,the testing accuracy increased to 97.3% within 50 meters error ranges owing to different view information in different orientation.At the same time,in order to verify the transfer capacity in the new scene,we used the model trained on Oxford Robot Car dataset to perform the localization test on campus dataset.According to the test results,the proposed memory segment network shows high transfer capacity with a 91.9% localization accuracy in a 10-meter visual range while a 1.3% performance loss compared with the training results.
Keywords/Search Tags:Metacognition, Depth Estimation, Adaptive, Image Geo-Localization, Matching Network
PDF Full Text Request
Related items