Font Size: a A A

Design And Implementation Of Similar Image Retrieval System Based On Spark Streaming

Posted on:2018-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:F K FangFull Text:PDF
GTID:2428330569485442Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of mobile Internet,social media and video surveillance technology,multimedia data uploaded to the network showed explosive growth,a large number of images uploaded through a variety of social platforms,such as WeChat,WeiBo,SnapChat,Instagram,Facebook and Twitter.However,these multimedia data are not being fully utilized,and most image processing systems are designed for small-scale,local computing,and can not meet the storage and computing requirements of massive image data.Therefore,for the massive image data,how to effectively store,quickly and accurately retrieve useful image data has become issues of concern.The emergence of distributed computing framework has made it possible for people to deal with large data.Spark distributed computing framework solves the shortcoming of Hadoop's insufficient processing speed and using the micro-batch concept to achieve Spark Streaming to meet real-time processing requirements.Spark also seamlessly integrates the machine learning library,so that parallel development becomes simple.In this paper,we study the image retrieval system based on Spark Streaming.Combine Harris feature and SIFT feature,we use Harris feature point to generate SIFT descriptor vector to describe an image.The Spark's distributed computing are used to extract the bulk feature data of the massive image data.The image vector is classified by the KMeans clustering algorithm in the Spark machine learning library and the visual dictionary is generated.The HBase distributed database is used to storage and retrieval the massive image data.The experimental results show that the Spark distributed computing framework can effectively improve the image processing speed for the massive image data,HBase can efficiently store and retrieve the image data,and use Kafka and Spark Streaming to meet the real-time retrieval request of video and image.The system has high throughput,as well as high fault tolerance,scalability and other advantages.
Keywords/Search Tags:Image Retrieval, SIFT, Harris, KMeans, Spark Streaming, HBase
PDF Full Text Request
Related items