Font Size: a A A

Research On Action Recognition Algorithms Based On Spatio-Temporal Binary Features

Posted on:2019-10-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:Fangbemi Abassin SourouFull Text:PDF
GTID:1368330551456943Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Human Action Recognition(HAR)in videos has a broad range of applications in the fields of video surveillance,video retrieval,human-computer interaction,etc.In recent years,it has become a hot research topic in the field of Computer Vision(CV).Although a considerable progress has been made in the last decades in this field,there are still many problems to be solved in HAR due to non-ideal imaging conditions,large intra-class differences,etc.Especially,the design and the development of real-time HAR applications that achieve high accuracy,fast speed performances and require low memory usage is very challenging.In order to provide better HAR systems that achieve a good performance or good tradeoff between those three above mentioned metrics,this thesis investigates the design and implementation of novel binary components,mainly focusing on two binary motion descriptors and a 3D convolutional filters.The main research works and innovations of the thesis are as follows:1.We proposed two new binary motion descriptors(BPPEM and PPSM)computed between two consecutive frames,and their extended descriptors(eBPPEM and ePPSM)computed between three consecutive frames.The proposed descriptors are evaluated on the Weizmann and KTH datasets,and the results were analyzed using multiple perfor-mance metrics such as accuracy,confusion matrix,recognition speed,and descriptors size.Experimental results show that the proposed descriptors achieve a good tradeoff between accuracy,speed,and memory consumption.2.We evaluated the impact of binary appearance descriptors such as FREAK,BinBoost,and LATCH on the performance of the proposed motion binary descriptors(BPPEM+PPSM and eBPPEM+ePPSM)by fusing the two types of descriptors.As above,the evaluation of the performance of the descriptors after fusing them was done on the Weizmann and KTH datasets using the same evaluation metrics.Experiment re-sults show that the proposed binary motion descriptors achieve better accuracy results than the descriptors obtained after the fusion because the BPPEM and eBPPEM descrip-tors,aside from capturing motion information,are also able to capture some appearance features.3.We proposed a new 3D Spatio-Temporal Binary Convolutional Network(3D ST-BCNN)by extending existing traditional 2D binary CNN to the time dimension.Experiments were conducted on a subset of actions from the UCF101 dataset.Because the original XNOR based CNNs require binary data as input,we also convert the orig-inal RGB video data into XCSLBP dynamic textures as binary input for the network.Experimental results show that we are indeed able to train the model with the proposed framework at a speed faster than traditional 3D CNNs.Moreover,the 3D ST-BCNN does not easily overfit and we are still able to achieve better accuracy results by increas-ing the depth or the complexity of the network.
Keywords/Search Tags:Action Recognition, Proximity Patches Pattern, Binary Proximity Patches Ensemble Motion, Proximity Patches Similarity Motion, 3D Spatio-Temporal Binary Convolutional Network
PDF Full Text Request
Related items