Font Size: a A A

Research On Image Dark Data Value Assessment Based With Similarity Hashing

Posted on:2020-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y J YangFull Text:PDF
GTID:2428330590958323Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rapid development of cloud computing and the Internet,the exponential growth of data poses great challenges to its storage and management.As a kind of data without labels or associations,dark data consistently occupy large-scale storage space but hardly produce value.Blindly applying data mining on these data is likely to bring huge waste of mining cost.Therefore,it is of great significance to analyze and assess the dark data before data mining.Based on the researches of applying image hashing and graph-based rank algorithms to value assessment of dark data,this paper designs and implements a framework for image dark data value assessment based on similarity hashing.The framework includes two stages: offline analysis and online assessment.In the offline analysis stage,the framework first transform image dark data into image hash codes by DSTH(Deep Self-taught Hashing)algorithm which can extract semantic features and similarity relationship of images,then take the hash codes as nodes to construct a semantic hash graph by using restricted Hamming distance,and finally use SHR(Semantic Hash Ranking)algorithm to calculate the overall importance score and rank for each node(image),which takes both the number of connected links and the weight on edges into consideration.In the online assessment stage,the framework first translate the user's query images into hash codes by using the same DSTH model,then match the suitable data via a predefined Hamming distance query range,and finally indicate the importance degree of the query images by average weighted importance score of these matched data,help the user judge whether the dark data set is worth mining for such query images.Experimental results show that the dark data value assessment framework proposed in this paper can really apply to large-scale image dark data,extract images semantic features with generalization ability and correctly calculate the importance score of images according to the Hash-based Graph.On this basis,this framework can handle the query requests of different mining tasks by setting an objective evaluation criteria,which can help users cognize the hidden value of dark data and assist them to conduct subsequent data mining work.
Keywords/Search Tags:Image Dark Data, Value Assessment, Similarity Hashing, DSTH, Hash-based Graph, SHR
PDF Full Text Request
Related items