| The concept of saliency originates from the human visual perception mechanism.The process of using computer algorithms to predict the areas of visual attention is known as saliency detection.The final detection result is usually expressed as a saliency map and the degree of salience for each pixel is indicated by a value between 0 and 1.Compared with image saliency detection,the exploration of correlation relationship in time domain,in addition to spatial domain,is involved in video saliency detection.Saliency detection is widely used in intelligent transportation,video monitoring,video abstract,video compression,and many other application fields.In recent years,there are two major research issues in video salience detection: the detection of eye fixation and the detection of discriminative area.This thesis focuses on the latter one that is aimed to capture the area of interest for the entire moving objects.In recent years,the mechanism of human visual cognition is better understood.Therefore,increasingly more visual models are proposed.However,there still exists many challenges for the detection of salient objects from complex scenes.This paper exploits various effective visual characteristics at multiple time scales to tackle the difficulties caused by complicated natural scenes,such as light intensity variations,target rotation,and scale variations.For the research on video salient object detection,we constructe a global optimization model with good generality and flexibility.In summary,the main contents and contributions of this thesis are listed as follows:1.We presented a hybrid energy feature extraction algorithm based on the information at two time scales.The goal of this algorithm is the convergence of the moving energy.In order to get the overall motion energy characteristics of the moving objects,we design and calculate three local motion characteristics: motion history energy between frames,motion region energy,and motion edge energy.At the same time,the motion energy is further combined with salient information in a single frame to obtain more reliable hybrid energy characteristics.The final salient object region is the result obtained by normalizing the hybrid energy map.The algorithm describes the salient object from different levels and angles at two scales,which is hierarchical,simple and intuitive.The proposed algorithm can suppress noise from background and motion,and thus can improve the detection accuracy effectively.2.We present a salient object detection algorithm that can work in weak contrast conditions by combining the information of history trajectory clustering.In a video,it often occurs that an object only moves slightly or even remains static for a short time.Moveover,if the color difference is not evident between the object and the background,then the static and dynamic contrast will be both weak.It will lead to a situation where we cannot extract the useful features from the moving objects.Usually,the impact of the loss of short-term movement information can be mitigated through tracking and matching the local feature points.However,when the quality of video is not high,the feature point matching itself is easily disturbed by noise.To solve the problem of weak contrast in a video,the video salient object detection algorithm is designed at the time scales(generally ranging from 5 to 15 frames)through combining the features of historical trajectory clustering.Experiments show that the algorithm can use the trajectory information of the tracked feature points,and can accurately locate the objects even in an intermittent static case.3.A salient object detection algorithm based on the crossing fusion of multi-visual features is proposed.Different contextual information can be obtained at different levels or at different time scales where different visual characteristics can be extracted.Each visual feature has different priorities about its own representation,some highlighting the edge of the target,some being dense or sparse,and some being either pixel or superpixel level.Faced with so many visual features,it is difficult to determine the rational weighted parameters if we adopt the scheme of linear superposition fusion.This paper uses the similarity network based on nonlinear crossing fusion to generate a similarity matrix considerting the information sharing between various features.On this basis,the robust salient object detection algorithm is proposed.Experiments show that this algorithm is beneficial to the information complementation and promotion of multi-feature view,and has robustness and reliability.4.A saliency optimization model constrained by local features is proposed.In mathematical programming,the global optimization model has reliability,universality and robustness.If the mathematic model about video salient object detection can be fomulated into a multi-dimensional global optimization model,it will avoid the subjectivity in choosing the fusion parameters,and improve the robustness and accuracy of detection.After acquiring the reliable regions,this paper proposes a global optimization model constrained by local features,which includes foreground items,background items,smoothing items and constraint conditions.The calculation of global optimal saliency is finally transformed into solving a system of linear equations.According to different application requirements,all terms in the model including foreground terms,background terms or the constraint condition terms can be defined or devised according to the current superior foreground(or background)prior.It will improve the detection accuracy and at the same time has good versatility and flexibility. |