Research On Audio-Visual Media Content Processing And Analyzing For Scene Understanding

Posted on:2017-03-08

Degree:Master

Type:Thesis

Country:China

Candidate:Y B Weng

Full Text:PDF

GTID:2428330485460835

Subject:Computer technology

Abstract/Summary:

Analysis and understanding of natural scenes is closely related to human life.It mainly includes two components,namely,auditory scene analysis and visual scene analysis.On one hand,auditory summary is one of the most important issues which can provide reliable clues for auditory understanding.On the other hand,text is an important carrier of content information for the visual scene understanding.Therefore,text extraction and understanding is becoming one of the most hot research topics.In this paper,we propose a novel auditory movie summarization algorithm by detecting sound events and scene changes.We first detect auditory changes in the intrinsic audio space to roughly segment each audio stream,then extract the MFCC-LDB feature and exploit a scoring algorithm to adaptively refine the segments.Next,audio events from movies can be identified in a hierarchical manner from background detection,foreground event recognition to key event identification.Moreover,a sound context model is proposed to discover the correlations between audio events and detect scene changes accordingly,thus generating the final audio summaries from a movie.The experiments on different auditory categories from movies and TVs demonstrate the effectiveness of the proposed approach.Text detection and recognition in degraded videos is complex and challenging due to lighting effect,sensor and motion blurring.Therefore,we present a new method that derives multi-spectral images from each input video frame by studying non-linear in-tensity values in Gray,R,G and B color spaces to increase the contrast of text pixels,which results in four respective multi-spectral images.Then we propose a multiple fu-sion criteria for the four multi-spectral images to enhance text information in degraded video frames.We propose median operation to obtain a single image from the results of the multiple fusion criteria,which we name fusion-1.We further apply k-means clustering on the fused images obtained by the multiple fusion criteria to classify text clusters,which results in binary images.Then we propose the same median operation to obtain a single image by fusing binary images,which we name fusion-2.We evalu-ate the enhanced images at fusion-1 and fusion-2 using quality measures.Furthermore,the enhanced images are validated through text detection and recognition accuracies in video frames to show the effectiveness of enhancement.

Keywords/Search Tags:

Audio Summary, Audio Segmentation, Event Detection, Text Enhance-ment, Multi-spectral Fusion

Related items

1	Audio Analysis Based On Content And Scene Recognition
2	New Audio Event Detection Based On Atomic Model
3	The Realization Of The Multi_type Audio Events Detection
4	Research On Abnormal Audio Event Surveillance In Real-world Scenes
5	Research On Algorithm Of Audio-visual Event Recognition And Sound Source Localization Based On Audio-visual Fusion
6	Research On Key Issues Of Audio Event Detection And Classification For Complex Audio Documents
7	Research On Audio Event Detection For Audio Surveillance
8	Research On Audio Event Recognition Based On Deep Learning
9	Detection Of Audio Events With Scene Dependence
10	Research On Detection And Enhancement Of Abnormal Audio Event Based On ICRNN-GRU