Font Size: a A A

Research On The Key Technology For Domestic Acoustic Scene Recognition

Posted on:2021-08-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:A W ChenFull Text:PDF
GTID:1488306464981279Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
As aging is becoming more and more serious in the world,the all-day intelligent monitoring system based on audio and video analysis has received increasing attention from the world.The video monitoring system can be constrained by lighting and occlusion.Audio monitoring can make up for the shortcomings.With the development of smart city systems,home acoustic scene recognition(analysis)has received increasing interest in recent years.Currently,the research of acoustic scene recognition is still in its infancy.Therefore,the thesis takes the home acoustic scene recognition as the goal of research and develops and contributes to the following aspects.(1)Establish a home acoustic scene database.The database is recorded in two urban and two rural families which include adult males and females,infants,young children,and elder family members.The position of the microphone is a location in the bathroom,living room,kitchen,and bedroom,etc.The main types have a bathroom,watching TV,cooking,eating,cleaning,and living room.Each type of scene contains 180 clips which are about 50 seconds long audio segments.At the same time,we establish a database of abnormal acoustic events for the application of home monitoring,which includes gasping,fall,crying and screaming,etc.Each type contains 50 examples and with at least one complete acoustic event in the dataset.(2)The studies on detecting the key/abnormal acoustic events in the home environment.A method of cycle supervised learning(CSL)is proposed for the detecting system.The innovation is that a feature is picked in each cycle processing based on the statistic histogram of different features discriminant,such as ZRC,MFCCs,and spectral roll-off,etc.We detect only one abnormal event,while other acoustic events are treated as the background sounds in a cycle.The experiment result shows the average detection rate of thirteen kinds of abnormal acoustic events is 92.59%(the average false alarm rate is 18.8%),which proves the effectiveness of the algorithm in a real environment.(3)A blur-invariant coding method based on the key acoustic shapes is proposed to solve the problems of the overlapped acoustic events and less prior information on detecting an acoustic scene in a real-home environment.Extracting the information of key shape from the acoustic spectrum image based on the prior information.The steps of this algorithm are: we first learn a codebook of k visual words using k-means clustering from the labeled key acoustic events.We chose the codebook length k according to the classes.This codebook had k components,and these represented k types of prior information.Each local descriptor is associated with its nearest visual word.We re-aggregated the visual words to the descriptors according to the VLAD method.Experiments show that the blur-invariant coding method based on key acoustic shape achieves 98.2% and 92% classification results for DCASE 2016 and RITIS Rouen acoustic scene databases,which get a significant performance boost than Conv Net methods.(4)If a key acoustic event overlapped with another key events in an acoustic scene,extracting the distinguishing features from the spectrum image using the method(3)should have some disadvantages,for example,it may lose some key acoustic shape information.For the reason,we proposed a novel multi-scale coding model based on the key acoustic space.The proposed method is coded by extracting multi-scale dominant texture patterns from a spectrum image.We firstly cluster the key acoustic words W by the k-means algorithm.Then,the distance comparison and the shortest distance is performed between the keyword W and the component x of the spectrogram image.Finally,we extract the multi-scale dominant texture patterns using the LBP coding method from the new descriptor.Experiments verify the effectiveness of the classification of several datasets.The results show the MSBF-KAE LBP is 98% and 94%,and MSBF-KAE HOG is 97% and 93% on the DCASE 2016 and Rouen RITIS acoustic scene database,respectively.(5)Study on the representation of the correlation between an acoustic scene and abnormal acoustic events.For example,it's normal that some crash events occurred in a cooking scene,but it must be abnormal if the crash events tooked place in a bathroom scene.Motivated by the application of audio monitoring,we proposed a sparse representation classification method based on the exemplar dictionary.In this method,we firstly construct an exemplar dictionary and represents the acoustic scene and abnormal acoustic,respectively.Then,we exploit the sparse representation of the test audio signals y and the discriminative dictionary learning based on discriminative OMP and K-SVD.Experiment results show that the average recognition precision is 86.5% to the acoustic scenes and acoustic events using the home scene database.
Keywords/Search Tags:acoustic scene recognition/analysis, audio monitoring, key/abnormal acoustic event recognition, home environment, computer vision, discriminative dictionary learning
PDF Full Text Request
Related items