Font Size: a A A

Research On Content Sifting And Storage Mechanism Of Cross-modal Image And Text Data Based On Semantic Similarity

Posted on:2021-08-09Degree:MasterType:Thesis
Country:ChinaCandidate:C GuoFull Text:PDF
GTID:2518306107460704Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rapid development of cloud storage technology and the explosive growth of multimedia data,massive cross-modal data is uploaded to the cloud,showing the crossmedia characteristics of multi-modal data mixing and coexistence,different modal data expressing similar semantics,etc.Intelligent management and cross-modal analysis of large-scale multi-source heterogeneous data has become a new challenge faced by traditional cloud storage systems.The increase in data volume and the difference between the modalities have led to a sudden increase in the difficulty of retrieving valid data from the messy data.The data in the existing storage system cannot establish a semantic-related connection,and the system must read all data from the disk before analyzing,which will inevitably cause a huge time delay.Intelligent retrieval of matching data based on user needs and content relevance has become a current research hotspot.At present,this paper only considers two modes of image and text,and proposes a crossmedia image and text content sifting storage mechanism for providing large-scale online similar content sifting services.First,generate the unified semantic space hashcode after cross-modal fusion according to the self-supervised adversarial hashing learning algorithm SSAH in the offline stage.As similar files have similar hashcodes,calculate Hamming distance between the hashcodes,and then use Neo4 j to construct semantic hashcode graph.In the semantic graph,establish a mapping between hashcodes and storage paths.Convert the image or text on which the user based to a hashcode to find similar nodes through the semantic hashcode graph within the sifting radius in the online sifting stage,and then find the storage path of similar files to return the sifting results.Through this storage-level sifting design,it can effectively alleviate the high-latency problem of reading all data first when analyzing large-scale data.The experimental results on three public cross-modal image and text datasets show that the CITCSS system implements the online similar content sifting in the cloud storage environment at the expense of a certain recall rate within an acceptable range.The accuracy of image and text retrieval is better than other benchmark algorithms.At the same time,the performance of system sifting has a significant promotion with a small time overhead and storage overhead of constructing semantic hashcode graph,which will provide support for online semantic query and big data analysis on cloud storage systems in the future.
Keywords/Search Tags:Cross-modal Fusion, Image and Text Retrieval, Hash Learning, SSAH, Semantic Graph
PDF Full Text Request
Related items