Font Size: a A A

Research On Content Based Video Structure Analysis And Abstraction

Posted on:2008-05-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z H SunFull Text:PDF
GTID:1118360212497821Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Content based video analysis and retrieval is a very active research topic in multimedia information processing area. It aims at extracting the most descriptive information feature from unstructured video data, detecting the basic video content unit, fusing these units and abstracting the original video. In this way the access point at various levels for video content is built, which is convenient for user retrieving and browsing video content. The classical application of video content analysis methods includes large scale video database, medical care, remote education, and home entertainment etc. The process of content based video analysis and retrieval can be divided into four parts, video stream segmentation, video abstraction, and video retrieval and indexing. In this thesis, deep research was carried on video shot boundary detection and video content abstraction, moreover, further research work is recommended.Because the video data lacks of the corresponding structure information to its content, in order to index and retrieve video content, we should first detect the basic video content unit. Shot is composed of a frame sequence recorded during a camera operation. Different operating style and the content post-editing method make various shot changing style. It can be divided into two categories, cut and gradual change. When a cut shot appears, there is obvious change between the last frame and the first frame from two neighbor shots. While during a gradual change, there will be no obvious between such two frames. Generally, a threshold is set to compare with the change between frame pairs, once the change beyond the threshold, a shot boundary is declared. However the threshold usually depends on the video content. Different video style may require resetting a new threshold. To solve this problem we propose an unsupervised shot boundary detection method based on color and object outline features. Difference in color metric between the frame pairs can correspond to the global change through video shot and the difference in object outline metric will imply the local change. We project these two into a feature space and through clustering their distribution we can determine the appearance of the shot boundary.After segmentation of the video stream we have the information about the basic processing unit. To further abstract the video information, it is necessary to select the most representative key frames from every shot. The selected key frames should possess more information about the shot as well as have least number. For the condition that little change happens in a static shot, it is simple to select the key frames. We can select the first, the last or the middle frame from the shot as key frame. However this kind of selection may not work well once there is gradual change among the shot frames. To solve this problem, we propose global color statistic information based key frame selection method. First a temporally maximum occurrence frame (TMOF) is constructed from the shot frames. The TMOF stores the color with the highest occurrence frequency through the whole shot at the same pixel position. It reflects the distribution of the color through the shot in temporal and spatial domain. Then a weighted metric is calculated between the each frame within the shot and the TMOF. We select the frame with the local peaks in the feature curve as the key frames. To evaluate the performance of the key frame selection method, we choose fidelity and compression ratio as the criterion. Fidelity computes the Semi-Hausdorff distance between the shot and its corresponding key frame set. The higher the fidelity, the more representative power the key frame set has. At the same time we compute the ratio of key frame number to the number of the total frames in the shot. The compression ratio measure the compactness of the key frame set. Experiment results show that the proposed method achieves high fidelity and compression ratio.Another kind of video abstraction is video skim, which aims at intercepting video clip with the most content descriptive ability. Then fuse these clips into the final video skim for users. Compared with the static abstraction as the form of key frames, video skim not only extracts the obvious visual content but also keeps the audio information which is important for user the understand and enjoy the video content. To generate a video skim we first classify audio data into different categories. To do this features are extracted from audio short time frames and audio clips. These features can describe the audio of different category. Then put these features into a Support Vector Machine based ensemble learning classifier and achieve the audio labels. To generate the candidate video skim clips, we first detect the silence from the audio. If the length of the detected silence is beyond a threshold, we take it as a silence part and record its position. Then choose the clip between two silence positions as candidate skim. Compute audio significant energy, music ratio, speech ratio, video content density and motion density in the candidate skims. According to users'preference, a weight coefficient is set for each skim clip. Through an arithmetic logic computation, each skim clip is set a score. Finally rank these scores and choose from high to low score skim clip according to users'preference for the length of the final video skim. Combine these chosen clips into the final video skim. To evaluate the video skim generation performance, we build a user inquiry table. Informativeness and enjoyabilty are to items for the evaluation of the video skim. After final test our video skim method achieves high subjective score.Based on the above research contribution, we design and realize a video content analysis and retrieval system. The system takes distributed structure, enables users access the video database from remote end and support users to query similar video clips through key words or example image.At the end of the thesis, we concluded the work and give the recommendation for further research in the future.
Keywords/Search Tags:Content-Based Video Analysis and Retrieval, Video Shot Boundary Detection, Video Abstraction, Key Frame Extraction, Video Skim
PDF Full Text Request
Related items