Font Size: a A A

Studies On Affinity Propagation Based Pseudo-Relevance Feedback And Document Expansion For Spoken Document Retrieval

Posted on:2020-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:F ChenFull Text:PDF
GTID:2428330590964134Subject:Transportation engineering
Abstract/Summary:PDF Full Text Request
The rapid development of information and Internet technologies has enabled massive amounts of multimedia information to be quickly transmitted on the Internet and generate a large number of audio and video files.How to quickly find users' information needs in these huge amounts files is a main task of spoken document retrieval.As the Speech recognition and information retrieval technology is mature,retrieval of excerpts from recordings of speech using a combination of automatic speech recognition and information retrieval techniques provides a new solution to this task.Although the spoken document retrieval technology has made some great progress,there are still many problems need to be further studied.The query word mismatch is an important factor that affects the performance of information retrieval.The keyword misrecognition due to the Out of Vocabulary(OOV)in speech recognition which also produces noisy data in speech transcription makes the mismatch problem more serious.This thesis discusses and proposes several approaches to deal with above problems.In this thesis,a language modeling based spoken document retrieval system is introduced.The problems mentioned above in spoken document retrieval are analyzed and studied.The main works of this thesis are as follows:1.The language model method provides a new perspective on text information retrieval.The thesis introduces a language modeling based spoken document retrieval system by using a combination of speech recognition and information retrieval technology.2.Information retrieval performance is highly affected by the query misformulation and word mismatch problem.In this thesis,a pseudo-relevance feedback method based on AP and word2 vec is proposed.The experiments show that the performances of this method are much better than the other word2 vec based query expansion method and classical language model LM-HEQ method.3.The keyword misrecognition and noisy data introduced by speech recognition engine due to the OOV problem makes the query mismatch problem more serious and leads to the deterioration of retrieval performance.It cannot solve with this problem easily with traditional speech recognition method.The thesis proposes a new document expansion method based on deep semantic computing.By using the text information on internet,we can compensate the missing information by adding some absent words and enrich the errorful document contents.The experimental results show that the retrieval performance is improved greatly when the recognition error rate is high.4.Query expansion can solve the retrieval performance problem from the perspective of query word mismatch,and document expansion can effectively solve the problem of missing keywords and noise problem in speech transcription.Combining two technologies can improve the performance of speech information retrieval greatly.The experiments show that the retrieval performance is improved by 8.02% when the error rate is 39.27%.
Keywords/Search Tags:Spoken Document Retrieval, Language model, Word2Vec, Query Expansion, Document, Expansion, Pseudo-Relevance Feedback
PDF Full Text Request
Related items