Font Size: a A A

Deep Learning Based Scene Adaptive Target Tracking And Recognition

Posted on:2022-10-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:W H ZhangFull Text:PDF
GTID:1488306605989079Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Target tracking and recognition can be roughly divided into two steps:target feature representation and feature discrimination.Therefore,effective and efficient feature representation and robust discriminator are essential for target tracking and recognition.In recent years,target tracking and recognition have attracted more and more attention and been extensively developed in computer vision.However,in practical applications,there are still problems of disturbed crests in the discriminator,and insufficient representation of dynamic,texture,and priori features.This dissertation makes a thorough study on the above issues.More specifically,the theoretical and applied creative contributions are as follows:1.To restrain unexpected crests in response for correlation filter framework,a l2-norm based sparse response regularization term is proposed.correlation filter trackers learn online to regress the region of interest into a Gaussian response.However,due to the uncertain transformations of tracked object,there are many unexpected crests in the response map.When the response of tracked object is corrupted by other crests,the tracker will lost the object.Therefore,the sparse response is used to increase the robustness to transformations of tracked object.Since the novel term is directly incorporated into the objective function of the correlation filter framework,it can be used to improve the performance of many methods which are based on this framework.Moreover,from the solutions we derive,the new method will not increase the computational complexity.Experimental results demonstrate that the proposed regularization term can improve the tracking performance of various correlation filter trackers.2.To automatically assign suitable features to describe the specific target in each frame,a novel dynamic feature-adaptive tracking framework is proposed.A repository consisting of numerous advisors is constructed.For each frame,the tracking process can be divided into the following three steps.First,several executive advisors are selected from the repository according to their past performance reflected in the form of a selection rate.Second,the executive advisors generate the tracking results and the performances of them are evaluated via a novel evaluation mechanism to generate the final tracking result.Finally,the selection rate of each advisor in the repository is updated according to the performance evaluation.Compared with traditional post-event ensemble trackers that manually design trackers,the proposed method learns to assign appropriate advisors which leads to adaption to different cases.Moreover,overfitting caused by fixed trackers can also be mitigated via dynamic tracking.Experiments on several public available data sets demonstrate the superiority of the proposed method.3.Satellite videos usually own a lower resolution,which leads to the following difficulties when compared with tracking in natural and drone videos:first,the width and height of a target usually has a few pixels;second,targets usually with similar appearance.Similar targets further lead to lower discrimination ability among targets.Similar targets can easily interfere the tracking result during tracking process when closely arranged.Under this circumstance,the performance of general tracking methods will be limited.However,the satellite camera is approximately fixed and the frame rate is related high,so the movement of the target between two adjacent frames is relatively stable.Therefore,we propose a predict network to predict the location probability of the target in each pixel in next frame based on fully convolutional network.In addition,vehicles generally drive on the road.A segmentation method is adopted to generate the feasible region for target in each frame and assign high probability for background mask of such a region in the next frame.Experimental results on several representative targets demonstrate the superiority of the proposed method.4.In the forward process of convolutional neural networks,the high-frequency/texture features are gradually blurred with hierarchical down-sampling and convolution operations.In image scene classification,high-frequency features are important to distinguish the diversity within a class and the similarity between classes.For example,the line features are crucial to distinguish a tennis court from a basketball court.For tennis court in different scenes,the highlight of line features can effectively avoid the influence of background.In order to learn the appropriate high-frequency features blurred,a novel texture learnable convolutional neural network is proposed.In this network,there are two pathways.The original convolutional neural network architecture can be taken as the low-frequency pathway and a high-frequency pathway that propagates the high-frequency features generated in each layer is proposed.However,the high-frequency information usually show large variance between images of the same class.Therefore,to enhance the intra-class similarity of high-frequency features,a new objective is proposed for high-frequency pathway.Numerous experiments on three public aviaible data sets demonstrate the surperior performance of the proposed network.5.Multi-scale features and high-frequency features are both crucial for object detection,For example,texture features are critical to distinguish basketball court and tennis court.Feature pyramid based structures are efficient to represent multi-scale features.But in general feature pyramid based structures,high-frequency features are not specially considered.Considering that Laplacian pyramid consists of high-frequency information in each level.To learn appropriate high-frequency feature representation of the target,a Laplacian feature pyramid network is proposed.The construction of the proposed network consists of bottom-up pathway,Laplacian pathway,and fusion pathway,that generate low-frequency pyramid,high-frequency pyramid,and compound pyramid,respectively.The bottom-up pathway follows the computation flow of the backbone convolutional neural networks which is similar to general feature pyramid based structures.The Laplacian pathway extracts the high-frequency features of objects through trainable Laplacian operator.Finally the low-frequency and high-frequency feature pyramids are fused to generate the compound pyramid in efficient ways.Two fusion methods,linear and nonlinear,are proposed.Numerous experiments demonstrate the effectiveness and efficiency of the proposed method.
Keywords/Search Tags:sparse learning, feature representation, representation learning, priori learning
PDF Full Text Request
Related items