Font Size: a A A

Design And Implementation Of Text-Video Cross Modal Search System Based On Knowledge Graph

Posted on:2022-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:Q K NiFull Text:PDF
GTID:2518306773997559Subject:Theory of Industrial Economy
Abstract/Summary:PDF Full Text Request
With the increasing popularity of mobile Internet in all age groups,people are gradually used to publishing videos on the Internet to record daily life,sharing knowledge and expressing themselves.In this era of knowledge sharing,some content creators need enough rich and interesting video materials to make more attractive videos to catch the attention of the audience,and how to retrieve these massive videos as content materials has become a new pain point.However,the current mainstream video platform search system based on title and tag index matching does not have the ability to understand the video content,so it is difficult to meet the needs of such users for accurate search.Therefore,it is of great significance to study and design a video search system to meet the needs of this kind of users.At present,the mainstream model of cross modal video search based on deep learning adopts an end-to-end modeling scheme,which can only have a good effect on short(about 10s)videos with single video content topic.This end-to-end modeling solution is still in its infancy,but it can not meet the requirements of commercial companies.Tiktok video search scheme uses optical character recognition(OCR)and speech recognition(ASR)technology to disassemble video content and index,which can solve long time video search,but can not achieve user search purposes.Therefore,aiming at the characteristics of knowledge video,such as the amount of information explained by voice is much more than that of the picture,the length of video and many subject dimensions,this paper proposes a separate modeling method,that is,by extracting the video voice content into text,and then extracting knowledge,so as to transform the problem of cross modal video search into the problem of entity retrieval in knowledge map.This paper designs and implements a text video cross modal search system based on knowledge map.The implementation of the system is divided into three parts: video management module,speech recognition module and knowledge map module.Video management mainly realizes the functions of video upload and video search.In the video management module,Django framework is used to realize the video upload function,and then the search function is realized by template question parsing based on info.In the speech recognition module,this paper first extracts the spectrogram features of video speech,then fine tunes the VGg network structure and eliminates repeated symbols in combination with CTC to make the acoustic model adapt to the spectrogram,then trains the acoustic model to get Pinyin,and finally converts Pinyin into Chinese text through statistical language model.In the knowledge map module,knowledge extraction consists of two parts.Firstly,the entity recognition function is realized by building the bilstm CRF model,and then the bilstm attention model is built for the two-step knowledge extraction scheme of relationship extraction.The neo4j map database is used for knowledge storage to construct the knowledge map related to video content.In this way,the system can parse the user's input into the entity query of the graph database to retrieve the results.Finally,this paper gives the specific test scheme and detailed test cases,and carries out the function test and system test of the system.The analysis of the test results proves that the system has achieved the expected effect.
Keywords/Search Tags:Video Search, Language Recognition, Knowledge Graph
PDF Full Text Request
Related items