Font Size: a A A

Research On Key Algorithms For Video Based Smart Retail Cabinet

Posted on:2020-10-29Degree:MasterType:Thesis
Country:ChinaCandidate:Pubudu EkanayakeFull Text:PDF
GTID:2428330578466905Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Smart retail store is a hot topic in which many tech companies have shown their interest.Companies such as Amazon,Orange,Deep Blue,and IBM have already started working towards enhancing customers' buying experience with smart retail stores.Most of the available smart retail store systems are complicated and small-scale retailers face difficulties to afford the cost.When considering a small scale smart retail store or cabinet,there are certain key problems to be addressed and solved before setting up:how to get certain annotated image datasets,which contains those pictures of products taken from multiple viewpoints,for the training of supervised deep learning model for product detection;how to design a light weighted deep learning model for product detection and recognition,meeting with the primary requirements on accuracy,speed,and capacity;How to design a light-weighted machine learning to estimate the human's age and gender simultaneously from face images.After the detailed survey on the technologies used in smart retail store and cabinet,this thesis provides a practical overall framework for a prototype system of smart retail cabinet and gives some practical solutions to the above three key problems.This thesis provides a simple and efficient pipeline,which involves several steps of human-machine collaboration.The first step is to prepare the initial image datasets by taking photos of selling products from different viewpoints.Each image only contains one product and is labeled with its product's category label.Some images can contain multiple products of the same or different categories and be multiple labelled with the combination of its products' category labels.The second step is to select a small number of images from the initial image datasets.For example,to select 50 single product images for each category of product.By applying the pre-trained Mask-RCNN to each of these images,cropped images are generated,each of which contains the coordinates of its bounding box and its category label.Then,human experts help to sort out these cropped images into two subsets,one subset for correctly cropped images with the correct label,another subset for cropped images that are hard negative or background.After that,VGG16 are applied to each image of the two subsets,and the results are used to train a classification model,which will be reused soon.The third step is to generate the annotated image datasets.By applying the pre-trained Mask-RCNN,VGG16,and the classification model trained soon before,each image of initial image datasets can generate one or more correctly cropped images with the correct labels.Finally,an interactive software with a friendly human-machine interface can be used to check each cropped image with its label and then generate the image datasets with high quality.This pipeline can efficiently cut down the cost of labour.Based on the single staged object detection,this thesis introduces a custom module which helps to reduce the number of parameters of a CNN model while preserving the original accuracy of the model.The main idea of designing this custom module is trying to reduce the computational cost and capacity for storing the model weights while maintaining the accuracy which original model held.The designed custom module can be attached to almost any deep learning model by making some minor changes to the original model.The custom module I designed helped to reduce the number of parameters by 41.7 7%in the YOLO model,which is our primary concern at the moment.Further,a single lightweight model architecture was designed to estimate both age and gender simultaneously from face images.Face detection model was used to detect the faces before the estimation of age and gender.This designed lightweight model removed the requirement of two separate models,which leads to reducing the capacity of the model and inferring time.The estimated age and gender will be used to provide product recommendations in the future version.The innovation points of this thesis are:(1)A naive,yet effective pipeline and algorithms to get the first set of bounding box annotations for a custom image dataset(2)A custom module which helps to reduce the number of parameters of a model while maintaining the original accuracy.(3)A light-weighted CNN model to estimate both age and gender simultaneously from face images.
Keywords/Search Tags:smart retail cabinet, product detection, product identification, bounding box annotation, age estimation, gender estimation
PDF Full Text Request
Related items