Font Size: a A A

Audio Scene Recognition Based On Probabilistic Latent Semantic Analysis

Posted on:2014-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:C H ZhouFull Text:PDF
GTID:2268330422950623Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of audio and video networks and the continuousimprovement of people’s living standard, A variety of audio and video filesrecording daily life emerge in the major audio and video sites, and along withhuman marking theme subjective and arbitrariness as well as quality differencesbetween audio and video brought by recording tools, it is an enormous challengeto manage and identify these audio and video files. Therefore, an effectiveintelligence system is urgent needed for classifying these audio and video files,and, intelligence system based on sound not only can manage and identify audiofiles, but also can analyze the audio information of video files and providetechnical support and supplement for vision-based intelligent systems.Audio scene recognition is one of effective means to solve the aboveproblems. Audio scene can be seen as a particular semantic label which cancharacterize and distinguish audio content, it is consist of a series ofsemantically related, time adjacent acoustic events. Thus the audio scenerecognizes and understands the semantic level of the audio content. Traditionalaudio scene recognition method is mainly divided into three categories, one ofwhich is based on heuristic rules, usually finished by comparing the audiofeatures with a certain threshold; second is based on the minimum distance,create template for each class audio scene, and then identify them by calculatingthe similarity or spatial distance; The third is based on the statistical theory, suchas audio scene recognition based on Gaussian mixture model and audio scenerecognition based on Hidden Markov model. Briefly, these methods don’tidentify audio scene directly, but recognize audio scene by detecting the keyacoustic event of the relative audio scene. These methods based on key acousticevent need higher experimental environment and higher quality experimentalcorpus, and are powerless to identify similar audio scene, because extracting anddefining key acoustic events of audio scene is very difficult. Nevertheless, themind of audio scene recognition based on key acoustic event is still very useful,the co-occurrence acoustic events can be seen as key acoustic event. Refer to thesemantic analysis method used by text classification, the co-occurrence acousticevents are regarded as synonyms and those appearing in multiple scenes acousticevent are regarded as polysemy. This article aims at solving the problem broughtby synonyms and ploysemy in audio scene recognition, the probabilistic latent semantic analysis model is the core of the method.The first step of audio scene recognition method based on PLSA is to buildacoustic events dictionary, this process achieved through the Gaussian mixturemodel, Gaussian component determines the membership of a MFCC featurevectors; The second step is to remove the influence of synonymy and polysemyof acoustic events by the probabilistic latent semantic analysis model; Finally,these audio scene files will be classified by support vector machine model. Totest the experiment results of audio scene recognition method based onprobabilistic latent semantic analysis model, the paper designs an audio scenebasis system based on long time statistical characteristics of MFCC and SVMmodel, the significance of audio long time statistical characteristics of the audioscene and the stability of support vector machine model determine thesignificance of the base system. Next, the paper improves the audio scenerecognition method based on PLSA. The first, using affinity propagationclustering algorithm achieves a more flexible cluster. The second, audio fileswas rebuilt by the idea of audio scene segmentation, and then an audio scene fileis consists of acoustic event. Audio scene segmentation is guided by acousticdictionary constructed by Gaussian mixture model, so we can achieve entirecontent-based audio scene recognition. The results show that, the audio scenerecognition method based on PLSA can effectively deal with influence broughtby synonymous and ambiguous acoustic event of audio scene files, and theimproved system based on AP and audio scene segmentation has betterperformance.
Keywords/Search Tags:Audio Scene Recognition, Probabilistic Latent SemanticAnalysis, Acoustic event, Gaussian Mixture Model, Support Vector Machine
PDF Full Text Request
Related items