Feature selection in large dataset processing, especially in the video domain

Posted on:2006-10-16

Degree:Ph.D

Type:Dissertation

University:Columbia University

Candidate:Liu, Yan

Full Text:PDF

GTID:1458390005993331

Subject:Computer Science

Abstract/Summary:

The rapid growth and wide applications of digital video data have led to a significant need for video classification. Unfortunately, the gap between high-level concepts and low-level features and the high time cost of video analysis are two important obstacles of efficient video data management. Feature selection, which can improve the prediction performance of the predictors, provide faster and more cost-effective predictors, and provide better understanding of the data, is introduced to address these two problems. But applying existing automatic feature selection algorithms to video data is impractical because of the unrealistic amount of computer time. So far, most feature selection technologies in video applications are based on researchers' intuition although human interaction can't satisfy the dramatic increase of video data and the multiple requirements of different users.; The first automatic feature selection algorithm we proposed is the Basic Sort-Merge Tree (BSMT), which is well-adapted to the characteristics of video data. The linear time cost of BSMT allows us the practical implementation of video frame categorization. To address the problem of sparse and noisy training data in video retrieval, we proposed the Complement Sort-Merge Tree (CSMT). CSMT detects complementary relationships shown in the outer wrapper model's results, in order to reduce the influence of more coarsely quantized prediction error. We provide empirical validation of this method by instructional video retrieval. Fast-converging Sort-Merge Tree (FSMT) speeds up BSMT further by setting up only a selected portion of the feature selection tree with two evaluation metrics, in order to satisfy the higher time cost requirement of on-line video retrieval. We demonstrate it with sports video shot classification. Multi-Level Feature selection (MLFS), based on the hierarchical structure of BSMT, permits a coarse-fine scene segmentation. The basic idea is to apportion different classification costs based on the classification difficulty of the different data. We demonstrate its improvement of the efficiency of video segment boundary detection compared with BSMT with instructional video scene segmentation. Based on the feature selection algorithms mentioned, we proposed a fast video retrieval system using different feature selection algorithms and lazy evaluation.; To show universality of our feature selection algorithms, we also provide some theoretical analysis. We simulate different feature selection algorithms on common synthetic datasets. The performance is compared from accuracy, efficiency and robustness. Finally, we propose the further work from two aspects: how to improve the feature selection algorithms and how to apply feature selection algorithms to different applications better.

Keywords/Search Tags:

Feature selection, Video, Data, Applications, Different, BSMT, Classification

Related items

1	Research About Feature Selection And Classification For Interactive Feature Of High-dimensinal Data
2	Classification Of Web Browsing And Video Services Based On Novel Feature Selection Algorithm
3	Network Video Service Feature Selection Based On Particle Swarm Optimization And Gravitational Search Algorithm
4	Research On Chinese Text Classification And Its Applications
5	The Classification Research Of Network Games And Video Service Based On Feature Selection
6	Feature Selection And Classification Of Internet Video Traffic Based On Genetic Search Algorithm
7	Applications Of Data Mining Techniques To Text Classification And Bioinformatics
8	Feature Selection And Classification Of Internet Video Traffic Based On Particle Swarm Optimization
9	Algorithms for enhancing pattern separability, feature selection and incremental learning with applications to gas-sensing electronic nose systems
10	Research On The Improvement Of Association Classification Algorithm And Feature Selection Of Multi-label Classification