Research On Visual Content Recognition And Analysis Based On Deep Learning And Context Semantics

Posted on:2018-12-23

Degree:Doctor

Type:Dissertation

Country:China

Candidate:X Y Ou

Full Text:PDF

GTID:1318330515983378

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet technology and the great performance of deep learning,image and video applications also got significant evolution.However,accompanied with the convenience these applications,it has brought some negative influence on the society.Therefore,how to identify useful information and filer the harmful information effectively and exactly from the massive and complex data sea,are realistic problem that needs to be solved in big-data environment.With the development of deep learning,the application fields of computer vision has been expanding rapidly,including image classification,object recognition,object detection,image segmentation,tracking,etc.This paper will aim at four typical applications,such as,adult content identification,scene parsing,makeup transfer and content based image retrieval.These works are based on the deep learning framework and integrate the hierarchical context and multi-context semantic information.To solve the hard problem of classification caused by the diverse samples,this paper proposes a high-level semantics based fine-to-coarse strategy and multi-context semantics based joint decision strategy.The adult context recognition is usually a binary classification problem,but complex samples will result in the intra-class distance maybe larger than the inter-class distance for some images,which increase the difficulty of training a classifier.The fine-to-coarse strategy improves the performance of the classifier by refining the categories in training.In addition,the diversity problem can be relaxed by multi-contexf modeling,which consists of global-context modeling,local-context modeling and cross-context modeling.Different from traditional feature fusion,policy fusion not combines the features directly.It is designed to ensure the accuracy that is produced by classification based global-context modeling,and uses detection based local-context modeling to fix wrongly discriminating samples.This strategy can improve the recall and precision simultaneously.Moreover,the modular design allows to improve overall system by upgrading the individual global-context modeling component or local-context modeling component.To solve the difficulty of scene parsing caused by the hard object that involves object scale(too small),interactive(occlusion),and hiddenness(easy obliterated in complex background region),this paper proposes a deep objectness region enhancement network(OENet).This network includes two core components:objectness region enhancement network and black-hole filling.The former uses the high confidence proposal regions to weight the areas of the specific channel of convolutional feature maps.The latter is used to avoid pixels are judged to nonexistent category by shielding extra background.In addition,the modular design makes the two modules not only can be updated by replacing a high performance one,but also can be applied to other existing scene-parsing network.Makeup transfer is a very interesting work.It starts with face parsing,and uses generative network to produce a natural-looking makeup.To solve two challenge problems:(1)how to get a precise face parsing,(2)how to keep(facial sharp and features)and transfer(lip-gloss and eye shadow)the feature of human on demand,this paper proposes a weighted cross-entropy loss and a deep localized makeup transfer network.The former is used for weighting specific local-context regions,and enforces the symmetric prior on some special areas,such as eyes and lips.The latter uses different feature to describe sharp-sensitive region and texture-sensitive region respectively.This generation network not only produces natural-looling makeup,but also controls the lightness of the makeup.To solve the problems of precision and efficiency in large-scale image retrieval,this paper proposes a hierarchical deep semantic hashing scheme.This network can produce high-level semantic and hash codes simultaneously.With probability-based semantic-level similarity and hashing-level similarity,the unrelated samples are filtered in advance by zero-cost high-level semantic information,and then the retrieval is achieved in a small candidate proposal set with hash codes.This scheme can achieve accelerating about 150 times with similar accuracy in imagenet dataset.In summary,the multi-context semantic fusion strategy and the deep learning methods are discussed in this paper.They not only have reference value,but also have reference meaning in design,development a practicable and robust application system.

Keywords/Search Tags:

Deep learning, Hierarchical semantic, Multi-context semantic, Image recognition, Image retrieval, Scene parsing

PDF Full Text Request

Related items

1	Research On Semantic Scene Parsing
2	Scene Content Parsing For Visual Image Data
3	Semantic Image Segmentation And Object Detection In Autonomous-Driving System
4	Research On Food Image Semantic Fusion Classification Algorithm Based On Hierarchical Semantic Graph Embedding
5	Image-Text Retrieval Based On Hierarchical Interaction Network
6	A Research On Image Retrieval Based On Label Semantics
7	The Research On Semantic-driven Image Mining Using Statistical Learning
8	The Research Of Semantic Image Retrieval Basedon The Deep Convolution Neural Network
9	Deep Learning Based Scene Text Recognition
10	Research On Semantic Segmentation Of Scene Image Based On Deep Learning