Research On Referential Expression Understanding (REC) Method Based On Expression-image Matching Detection

Posted on:2021-02-06

Degree:Master

Type:Thesis

Country:China

Candidate:E J Cui

Full Text:PDF

GTID:2438330626964206

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

In recent years,artificial intelligence has developed rapidly,and research in the intersection of computer vision and natural language processing has gradually attracted the interest of researchers.Referring Expression Comprehension(REC)is one of them.Referring Expression is a natural language expression that describes a particular object in a scene,such as “the man in the blue shirt”,“the book on the table”,etc.The task of REC is to locate the object described by the referring expression in a given image.The existing REC methods assume that the object described by the referring expression must exist in the image,and don’t judge whether the referring expression matches the image.However,in some scenarios of real applications,the assumption could not be the case.For example,a visual impaired user might tell his robot “please take the laptop on the table to me”.In fact,the laptop is not on the table anymore.Therefore,if the mismatched referring expressions and images are input into the existing REC methods,it will output the wrong locating result.Focus on the above question,the main research contents of this paper are as follows:In the third chapter and fourth chapter,this paper proposes a modular REC method to solve the problem.The modular REC method can determine whether the referring expression matches the image.If the referring expression is matched with the image,it can output the region of the object described by the referring expression;If the referring expression is mismatched with the image,it can release a linguistic feedback to explain the expression-image mismatching.The modular REC method consists of four modules: the expression parsing module,the entity detection module,the relationship detection module and the matching detection module.the expression parsing module parses the expression into three parts of subject,object and relationship.The entity detection module detects all the entities in the image and forms them into an entity dictionary.The relationship detection module detects the visual relationship between two entities in the image.The matching detection module can determine whether the expression matches the image according to the information obtained by the above modules,and outputs linguistic feedback or locates the object described by the referring expression.In fifth chapter,this paper builds the NP-Ref COCO+ dataset based on the public dataset Ref COCO+,and analyzes the dataset through some experiments.This paper designs some experiments on the NP-Ref COCO+ dataset to evaluate the performance of the modular REC method.The experimental results show that the modular REC method can effectively judge whether the referring expression matches the image and locate the object described by the referring expression.

Keywords/Search Tags:

Computer vision, Referring expression comprehension, Modular network, Matching detection

PDF Full Text Request

Related items

1	RETR:End-to-end Referring Expression Comprehension With Transformers
2	Research On Referring Expression Comprehension Based On Semantic Context
3	Research Of Referring Expression Comprehension Based On Visual-Language Cross-Modal Joint Learning
4	Research And Application Of Relation-Driven Referring Expression Understanding
5	Research On Referring Expression Based On Multi Module Attention
6	Research And Applications Of Referring Expression Generation Technologies Via Visual Dialogue
7	Research On Referring Expression Segmentation Based On Multi-Modal Multi-Scaled Feature Fusion
8	Binocular Stereo Matching Research In Computer Vision
9	Application Research Of Computer Vision Technology In Precision Detection Of PCB
10	A Number Of Issues. Computer Vision Techniques And Algorithms