Font Size: a A A

Research On Audio-Visual Media Content Processing And Analyzing For Scene Understanding

Posted on:2017-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y B WengFull Text:PDF
GTID:2428330485460835Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Analysis and understanding of natural scenes is closely related to human life.It mainly includes two components,namely,auditory scene analysis and visual scene analysis.On one hand,auditory summary is one of the most important issues which can provide reliable clues for auditory understanding.On the other hand,text is an important carrier of content information for the visual scene understanding.Therefore,text extraction and understanding is becoming one of the most hot research topics.In this paper,we propose a novel auditory movie summarization algorithm by detecting sound events and scene changes.We first detect auditory changes in the intrinsic audio space to roughly segment each audio stream,then extract the MFCC-LDB feature and exploit a scoring algorithm to adaptively refine the segments.Next,audio events from movies can be identified in a hierarchical manner from background detection,foreground event recognition to key event identification.Moreover,a sound context model is proposed to discover the correlations between audio events and detect scene changes accordingly,thus generating the final audio summaries from a movie.The experiments on different auditory categories from movies and TVs demonstrate the effectiveness of the proposed approach.Text detection and recognition in degraded videos is complex and challenging due to lighting effect,sensor and motion blurring.Therefore,we present a new method that derives multi-spectral images from each input video frame by studying non-linear in-tensity values in Gray,R,G and B color spaces to increase the contrast of text pixels,which results in four respective multi-spectral images.Then we propose a multiple fu-sion criteria for the four multi-spectral images to enhance text information in degraded video frames.We propose median operation to obtain a single image from the results of the multiple fusion criteria,which we name fusion-1.We further apply k-means clustering on the fused images obtained by the multiple fusion criteria to classify text clusters,which results in binary images.Then we propose the same median operation to obtain a single image by fusing binary images,which we name fusion-2.We evalu-ate the enhanced images at fusion-1 and fusion-2 using quality measures.Furthermore,the enhanced images are validated through text detection and recognition accuracies in video frames to show the effectiveness of enhancement.
Keywords/Search Tags:Audio Summary, Audio Segmentation, Event Detection, Text Enhance-ment, Multi-spectral Fusion
PDF Full Text Request
Related items