Font Size: a A A

Research On Semantic Understanding Of Freehand Sketches

Posted on:2020-09-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ZhengFull Text:PDF
GTID:1368330590972976Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As a common way for representation,freehand sketches play an important role in human life,which can be used to depict objects,sketch storylines,and design buildings.The abstract expressiveness and flexibility of freehand sketches make it easy to integrate with practical applications like image retrieval and scene generation.It results in an increasing demand for the processing and understanding of freehand sketches.However,existing work of image understanding mainly focuses on the real image data instead of the freehand sketch.Compared to the real image,the freehand sketch only contains several hand-drawn strokes and lacks the information of color and texture.The training data of freehand sketches are also very scarce.Furthermore,because of the impact of subjective factors like human drawing skill,freehand sketches show great differences in vivid degree,appearance,and details.It makes the semantic understanding of freehand sketches a very challenging task.Combining computer automatic technology and freehand sketches is a very valuable and challenging research topic.It requires the deep understanding and effective mining of freehand sketch data,which can further improve the work efficiency and meet diverse application demands.There are two main challenges in the research on semantic understanding of freehand sketches.First,there is a semantic gap between low-level features and high-level semantics,which also exists between real images and freehand sketches.Second,there lacks the freehand sketch data and the intra-class variance of freehand sketches is too large.Furthermore,existing methods have not fully considered the relation between the expressive ability of freehand sketches and the semantic understanding of videos.To solve the above challenges and problems in the field of semantic understanding of freehand sketches,this dissertation starts from the characteristics of freehand sketches and explores the sketch related theories and methods,which take the semantic-driven way from different levels.Specifically,the main contents and contributions of this dissertation can be summarized in the following four aspects:Firstly,this dissertation proposes a weakly supervised approach for discriminative patch mining,which aims to reduce the semantic gap between low-level features and highlevel semantics by the mid-level patch representation.To find the most discriminative patches for different categories of sketches,an iterative detection process is implemented to update similar patches in each cluster after the initial clustering of randomly sampled patches by K-means.Besides,the strategy of cluster merging and discriminative ranking are also applied to obtain more outstanding results.Experimental results on TU-Berlin dataset validate the effectiveness of the proposed method and also demonstrate its practical value on the task of freehand sketch recognition.Secondly,this dissertation proposes a CNN-based framework for part-level semantic parsing of freehand sketches,which makes three main contributions: 1)proposes a homogeneous transformation method to address the problem of domain adaptation.The domain adaptation between real images and freehand sketches is an inevitable problem for the task of sketch parsing.Unlike existing methods that utilize the edge maps of real images to approximate freehand sketches,the proposed homogeneous transformation method transforms the data from two different domains into a homogeneous space to minimize the semantic gap.2)designs a soft-weighted loss function for better guidance of the network training.Compared to the standard cross entropy loss,the proposed softweighted loss can solve the problems of ambiguous label boundary and class imbalance.3)presents a staged learning strategy to improve the parsing performance of the trained model,which takes advantage of the shared information and specific characteristic from different sketch categories.Experimental results show that the proposed deep semantic sketch parsing model achieves the state-of-the-art on the public SketchParse dataset.Thirdly,for the object-level understanding of freehand sketches,this dissertation proposes a sketch-specific data argumentation method,which aims to enhance the semantic understanding ability from two aspects of quantity and quality.With respect to the quantity,a Bezier pivot based deformation method(BPD)is presented to generate a substantial amount of new freehand sketches.This BPD method directly applies to the original single image sketch without requiring temporal cues of sketch lines.Being not subject to the type of input sketch data,BPD enables a broader range of applications.To improve the quality of sketches,a novel method called mean-stroke reconstruction(MSR)is introduced to produce an innovative form of sketches.The MSR uses the mean strokes computed on the training set to reconstruct the original sketches,which can effectively decrease the intra-class variance between freehand sketches.Since it does not demand a large number of same-class real images or rely on any additional cues,it requires low computational complexity when training the CNN model and relieves the cost of data collection.Experimental results on TU-Berlin and Sketchy-R datasets demonstrate the practical value of the proposed method.Finally,this dissertation brings the abstract property of freehand sketches into the action understanding of videos.We propose an action sketch based spatio-temporal representation method for the representative task of human action recognition in video semantic matching.By exploring the characteristic that action sketches should meet,this dissertation builds a system to discover the most distinctive action sketches automatically.For videos containing human action,action sketches for each clip can be well generated in real time.Combining these elaborate sketches,a distinctive ranking method for action sketches is proposed.The top ranking sketches can typically represent the action classes which they belong to.Among the obtained top ranking sketches,several sketch pooling methods are implemented to generate a new representation for action video.Finally,the new representation is combined with local feature based representation such as improved dense trajectories to improve the performance of action recognition.Through the above studies,this dissertation deeply explores different levels of semantic understanding for freehand sketches and presents feasible and effective solutions towards the key issues.The experimental results show that the semantic gap is a common problem in the field of freehand sketch understanding.According to the four levels of patch-level,part-level,object-level,and spatio-temporal object-level,the proposed semantic understanding models can effectively construct the relations between the abstract semantics of freehand sketches and visual media.Furthermore,our methods can solve practical problems in sketch related real-world applications such as sketch recognition,parsing,retrieval,and action recognition in videos.
Keywords/Search Tags:Freehand Sketch, Discriminative Patch, Part-level Semantic Parsing, Freehand Sketch Recognition, Action Representation
PDF Full Text Request
Related items