Font Size: a A A

Research On Speaker Information Analysis And Its Application To Multimedia Retrieval

Posted on:2011-03-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:J C YangFull Text:PDF
GTID:1118360308464614Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Due to rapid development in technology of internet and streaming media, multimedia information is exponentially increasing. Most of multimedia information is simply stored there and difficult for further reuse because of expensive of multimedia man-tag and lack of efficient indexing technology. Search engine which only process text keywords is unsuitable for multimedia retrieval development and multimedia retrieval based on content is the main stream.Human is mainly character in society, whatever thing has its mean on the condition that human is in it. Retrieval according to speaker is an efficient retrieval method, for example, speech and acting of people-dependent. The thesis focuses on the application of speaker information analysis in speaker change detection, Xin-wen-lian-bo story segmentation and key speaker discovery in multimedia retrieval. The main contributions of this thesis are as follows:(1) A modified BIC algorithm of speaker change detection is proposed to solve the problems of low detection precision and high computational cost in traditional BIC algorithm. Detection precision is improved by improving detectability; computation cost is decresed by limiting the maximum length of the first analysis window. Speaker change point is only detected in new range. The experiment results show that the proposed algorithm can decrease the biased error range from 0.1~0.5 second to 0.03~0.2 second and save the more time when the anlysis window is longer (saving 75% computation time for the 40s analysis window) compared with BIC.(2) To improve the precision of modified BIC algorithm of speaker change detection (SCD), a two-step SCD algorithm is proposed by making use of silence and gender information. Two-step criterion is used to decide the speaker change point (SCP) within detected speech segmentations. In the first step, pitch difference between different speakers and gender model are used to locate the SCP within neighboring speech segments; In the second step, a gender-based modified T2 criterion formula is used to locate SCP among the same gender speakers, and potential speaker change point is detected based on chunk. The experiment results showed that the proposed algorithm improved 8.74% in F1 which can reach 85.14% compared with modified BIC. For SCD with duration less than 2s, the algorithm can reduce missed detection rate of about 16%, compared with modified BIC .(3) To detect the story boundary (SB) of Xin-wen-Lian-bo precision, at the base of induction the structure of Xin-wen-Lian-bo, statistics of different types stories and news title' function, we proposed a method of three-step criterion story segmentation (SS) based on audioviusl feature. For the story with anchorperson, penny distance is used to judge whether the anchorperson who was gotten by video feature is true or not to look for SB; for the story without anchorperson, SB is found by judging the shot change point occurs the silence region. The experiment results showed the method can improve 6.92% precision compared with only using video feature for the story with anchor (by detection anchorperson) and the error range can decrease from 1.5~2.5 seconds to 0~0.5 second in SB of news without anchorperson by silence detection, which solved the problem of not precision boundary and losing video frame corresponding to silence for only using silence. The final precision of story segmentation reaches 93.12%.(4) To settle out the problem of key speaker discovery in multimedia, speaker key is defined by speaker frequency, speaker duration, average every time speaker length and speaker position factor, which is used to judge the people's importance and the biggest speaker key is regarde as key speaker. Penny distance and GMM supervector is used to index speaker firstly and key speaker is found at the base of speaker indexing.The experimental results showed that using penny distance and GMM supervector to index speaker can reach 88.24% in speaker indexing accuracy and 88.68% in speaker number accuracy; 95% key speaker can be found using the method based on speaker key.
Keywords/Search Tags:Speaker information analysis, Speaker change detection, Xinwenlianbo story segmentation, multimedia retirieval, Key speaker
PDF Full Text Request
Related items