Font Size: a A A

Visual Detection,tracking And Recognition Of Human Motion In Video

Posted on:2018-11-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:L WangFull Text:PDF
GTID:1368330590455261Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Human action recognition is an important research content in computer vision.It involves pattern recognition,probability theory,statistical learning,cognitive psychology,and other subjects.The applications include intelligent monitoring,automatic driving,human-computer interaction and collaboration,smart home and so on.Since the publication of KTH action database,action recognition achieved a huge development,with actions captured from fixed angle of view and recently complex scene.But there is much diversity in human actions.Even for the same action,the speed and duration may differ.In addition,the action is affected by the self-occlusion of human body and angle of view,which makes action recognition challenging.Most of the action recognition algorithms are data-driven methods.They learn the action features from a large amounts of data and design a classifier to distinguish the videos with different actions.However,existing methods do not fully consider the structure and the semantic information of the action.Action is a cohesive whole of temporal and spatial connections,involving human movement,interactive objects,and action scenes.The temporal and spatial connections and dynamic changes play a key role in action recognition.In this paper,the structure of actions and the associated action elements are modelled.The action elements include the position of human body,the object location and the pose of human body.Humans are detected by a human detector in video.The information of the foreground segmentation is beneficial to human detection.But the foreground segmentation often contain noises.In this paper,a foreground probability model is constructed by the statistics of pixel neighborhood.A threshold-free human detector is proposed based on the foreground probability model.The detector outputs the results directly,without having to set detection threshold.The optimal parameters of the model are learned from the data.A typical search mechanism for detection is the sliding window method,which detects all the windows exhaustively,resulting in expensive computation.The foreground probability model can be used to generate candidate windows efficiently.The same detection recall can be achieved by detecting less windows.Human action involves many types of objects,and there is no guarantee that all objects have enough samples for training the individual detector.So the locating of general objects are achieved by object tracking algorithm.A method is proposed to utilize both the local and global object representation.The trajectories of local keypoints are used to predict the object location.In the neighborhood of the predicted location,the matching of global features is utilized to search for the accurate location.For human detection in crowd,the canning range finder is used to track occluded multi-human,together with cameras.The tracking is based on the matching of detections.A matching problem is formulated and solved for object tracking,which is further improved by Kalman filter.Human pose is an important clue for human action recognition.Most methods of human pose estimation in images utilize the body part detector and spatial constraints between parts.But the part detector has weak performance of generalization due to the deformation of body parts.In video,the tracking of parts contributes to pose estimation.A temporal-spatial human model is proposed,where both the temporal consistency and spatial constraints of the body parts are modelled.The model is a tree structure,with the body part as the node and the part relation as the edge between nodes.A dynamic programming algorithm can be used to find the optimal solution as the results of pose estimation.Human action can be represented as a temporal sequence of sub-actions.Each sub-action is a short segmentation,involving human motion,object location,the interaction between object and human,etc.A hierarchical structure is used to organized the elements of the action and the associations between elements.Thus,the action is expressed as a hierarchical structure of action contexts.The similarity functions for the action structure and its elements are defined.In the learning procedure,the structured abstraction of actions and corresponding labels are stored into the memory unit.For recognition,the newly input action is organized first,and is compared with the action in memory.The Nearest Neighbor algorithm or k-Nearest Neighbor method is utilized to obtain the action label.To deal with the massive data,the degradation and reinforcement is applied to maintain data in memory.
Keywords/Search Tags:human detection, object tracking, human pose estimation, memory model, action recognition
PDF Full Text Request
Related items