Font Size: a A A

Research And Implementation Of Product Images Classification Algorithm Based On Distributed Deep Learning

Posted on:2019-10-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y GuFull Text:PDF
GTID:2428330545470704Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In Internet plus environment,e-commerce develops fast.Online shopping has gradually become one of the main methods for people to shop.How to quickly and accurately organize and classify more than Pb scale product images of e-commerce platforms becomes an urgent problem to solve.For product image classification,CaffeOnSpark,a distributed deep learning tool based on Spark cluster,is chosen as the platform to study the distributed deep learning in this paper.In order to ensure the accuracy of model classification,CaffeOnSpark sacrifices efficiency for accuracy,adopts a synchronous stochastic gradient descent optimization algorithm(SGD).SGD has to wait for the slowest node,leading to bucket effect to reduce training efficiency,and there is communication conflict when nodes synchronizing parameters with parameter server.The asynchronous stochastic gradient descent algorithm(ASGD)can solve the bucket effect problem and is widely used in other distributed deep learning platforms.However,there are still communication conflicts,wasting computing time,in ASGD.In additional,because of gradient obsolescence problem,the model convergence rate will be reduced,and the original acceleration effect can not be reached.The final model accuracy of ASGD is lower than that of the SGD.Firstly,aiming at shortcoming of CaffeOnSpark synchronization algorithm,a structure of CaffeOnSpark based on ASGD is designed in this paper.By implementing asynchronous training,the bucket effect problem of Caffe OnSpark is solved,and the training efficiency of model is improved.Secondly,in view of the communication conflict problem of ASGD,a random data slice strategy is proposed to alleviate the communication conflict and further accelerate the training rate.And in order to solve the problem of gradient obsolescence,the weak synchronization strategy is adopted to balance training efficiency and accuracy.Finally,the PI100 product images classification model is designed,and trained by the improved distributed deep learning algorithm.Compared with the CaffeOnSpark platform,the improved algorithm can achieve higher training efficiency under the same accuracy.Aiming at the low accuracy and over-fitting of small training data set,this paper adopts transfer learning method to obtain higher accuracy and generalization ability on PI100 data set and achieve product images classification tasks.
Keywords/Search Tags:Product images classification, Deep learning, Asynchronous algorithms, Random data slice, Weak synchronization
PDF Full Text Request
Related items