Image patches can be factorized into ‘shapelets’ that describe segmentation patterns, and palettes that describe how to paint the segments. This allows a flexible factorization of local shape (segmentation patterns) and appearance (palettes), which we argue is useful for tasks such as object and scene recognition. Here, we introduce the ‘shapelet’ model- a framework that is able to learn a library of ‘shapelet’ segmentation patterns to capture local shape, and hierarchical palettes of colors to capture appearance. Using a learned shapelet library, image patches can be analyzed using a variational technique to produce descriptors that separately describe local shape and local appearance. These descriptors can be used for high-level vision tasks, such as object and scene recognition. We show that the shapelet model is competitive with SIFT-based methods and structure element (stel) model variants on the object recognition datasets Caltech28 and Caltech101, and the scene recognition dataset All-I-Have-Seen. |