Font Size: a A A

Feature selection in large dataset processing, especially in the video domain

Posted on:2006-10-16Degree:Ph.DType:Dissertation
University:Columbia UniversityCandidate:Liu, YanFull Text:PDF
GTID:1458390005993331Subject:Computer Science
Abstract/Summary:
The rapid growth and wide applications of digital video data have led to a significant need for video classification. Unfortunately, the gap between high-level concepts and low-level features and the high time cost of video analysis are two important obstacles of efficient video data management. Feature selection, which can improve the prediction performance of the predictors, provide faster and more cost-effective predictors, and provide better understanding of the data, is introduced to address these two problems. But applying existing automatic feature selection algorithms to video data is impractical because of the unrealistic amount of computer time. So far, most feature selection technologies in video applications are based on researchers' intuition although human interaction can't satisfy the dramatic increase of video data and the multiple requirements of different users.; The first automatic feature selection algorithm we proposed is the Basic Sort-Merge Tree (BSMT), which is well-adapted to the characteristics of video data. The linear time cost of BSMT allows us the practical implementation of video frame categorization. To address the problem of sparse and noisy training data in video retrieval, we proposed the Complement Sort-Merge Tree (CSMT). CSMT detects complementary relationships shown in the outer wrapper model's results, in order to reduce the influence of more coarsely quantized prediction error. We provide empirical validation of this method by instructional video retrieval. Fast-converging Sort-Merge Tree (FSMT) speeds up BSMT further by setting up only a selected portion of the feature selection tree with two evaluation metrics, in order to satisfy the higher time cost requirement of on-line video retrieval. We demonstrate it with sports video shot classification. Multi-Level Feature selection (MLFS), based on the hierarchical structure of BSMT, permits a coarse-fine scene segmentation. The basic idea is to apportion different classification costs based on the classification difficulty of the different data. We demonstrate its improvement of the efficiency of video segment boundary detection compared with BSMT with instructional video scene segmentation. Based on the feature selection algorithms mentioned, we proposed a fast video retrieval system using different feature selection algorithms and lazy evaluation.; To show universality of our feature selection algorithms, we also provide some theoretical analysis. We simulate different feature selection algorithms on common synthetic datasets. The performance is compared from accuracy, efficiency and robustness. Finally, we propose the further work from two aspects: how to improve the feature selection algorithms and how to apply feature selection algorithms to different applications better.
Keywords/Search Tags:Feature selection, Video, Data, Applications, Different, BSMT, Classification
Related items