Font Size: a A A

Research On Referential Expression Understanding (REC) Method Based On Expression-image Matching Detection

Posted on:2021-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:E J CuiFull Text:PDF
GTID:2438330626964206Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
In recent years,artificial intelligence has developed rapidly,and research in the intersection of computer vision and natural language processing has gradually attracted the interest of researchers.Referring Expression Comprehension(REC)is one of them.Referring Expression is a natural language expression that describes a particular object in a scene,such as “the man in the blue shirt”,“the book on the table”,etc.The task of REC is to locate the object described by the referring expression in a given image.The existing REC methods assume that the object described by the referring expression must exist in the image,and don't judge whether the referring expression matches the image.However,in some scenarios of real applications,the assumption could not be the case.For example,a visual impaired user might tell his robot “please take the laptop on the table to me”.In fact,the laptop is not on the table anymore.Therefore,if the mismatched referring expressions and images are input into the existing REC methods,it will output the wrong locating result.Focus on the above question,the main research contents of this paper are as follows:In the third chapter and fourth chapter,this paper proposes a modular REC method to solve the problem.The modular REC method can determine whether the referring expression matches the image.If the referring expression is matched with the image,it can output the region of the object described by the referring expression;If the referring expression is mismatched with the image,it can release a linguistic feedback to explain the expression-image mismatching.The modular REC method consists of four modules: the expression parsing module,the entity detection module,the relationship detection module and the matching detection module.the expression parsing module parses the expression into three parts of subject,object and relationship.The entity detection module detects all the entities in the image and forms them into an entity dictionary.The relationship detection module detects the visual relationship between two entities in the image.The matching detection module can determine whether the expression matches the image according to the information obtained by the above modules,and outputs linguistic feedback or locates the object described by the referring expression.In fifth chapter,this paper builds the NP-Ref COCO+ dataset based on the public dataset Ref COCO+,and analyzes the dataset through some experiments.This paper designs some experiments on the NP-Ref COCO+ dataset to evaluate the performance of the modular REC method.The experimental results show that the modular REC method can effectively judge whether the referring expression matches the image and locate the object described by the referring expression.
Keywords/Search Tags:Computer vision, Referring expression comprehension, Modular network, Matching detection
PDF Full Text Request
Related items