Font Size: a A A

Research On The Method Of Robotic Object Detection Base On Natural Language Expression

Posted on:2019-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:H P LiuFull Text:PDF
GTID:2428330545453634Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
For robotics,videos and images was the main channel to get the information,and the natural language was the most natural way to communicate with human.Especially,for the service robotics,which working in the home environment and taking human as the service tenet,communicate with human was frequent.To make it more convenient to communicate with human,the service robotics could master an ability that localize the visual targets base on the natural language expression,thus the robot could render services base on human's natural language order(e.g." please give me the bottle under the desk").Recent years has witnessed significate progress in image and natural language processing with the deep learning technology which could learn features from data automatically,displaces hand feature step by step.In contrast,the task of object detection base on natural language makes little process.The main factors was three folded:1)The scale of datasets annotated with natural language was relatively small,it's hard to learn efficient feature expression from it.2)It's hard to model the relationship of natural language and image.Visual entities was with different attributions such as color,category,shape,et al,and the natural language could describe arbitrary attribute of them.3)In the home environment,all the visual entities in the same image was not isolated,but with all sort of relations with other objects,but little research focus on modeling relationship between visual entities.Focus on above questions,we explore the question of object detection base on natural language:given an image,an arbitrary natural language expression,how to localize matched object within the image with natural language prior?The main research contents of this paper was as follows:1)To get more efficient features in relative small scale dataset,we adopt the transfer learning method,pre-trained an image feature extractor and a word vector expression from large scale datatset.2)We propose two method to model the relationship between image and natural language.The first one calculate the distance of natural language feature and pre-extracted image region feature,and returns most matched region.The second method transforms this task as a binary classification task,making an expert system which observe the image region,natural language,and answers whether they matched.Experiment on the open source dataset RefCOCO and G-Ref show the advantage on efficient and speed.3)To understand the visual entities relationships more efficiently,we introduce the attention mechanism,jointly predict the attention weight base on the natural language and image information,making the model pay more "attention" on related visual entities.Experiment was made in G-Ref data set,shown the effectiveness of our model.
Keywords/Search Tags:Human-machine interaction, Convolutional neural network, Object detection, Natural language process, Relationship learning
PDF Full Text Request
Related items