Font Size: a A A

Research On Distributed Stratified Sampling And Its Application On Object Detection

Posted on:2018-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:M N LiFull Text:PDF
GTID:2428330569998615Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data sampling is a statistical inference method involving the extraction and estimation of a representative subset of the population.Through the observation of this subset,we can make estimation and evaluation on features of the overall population with data sampling technique,and thus gain knowledge from the population.Specifically,stratified sampling divides the population into distinct groups known as strata,in which the inner items are similar to each other.Stratified sampling can get higher statistical precision by generating subsets sharing a more similar distribution with the population.With the burgeoning data size and rapid advancement in the data collection and storage technology,data sampling techniques can be used to gain statistical results,estimation and approximation of data in short time with slightly reduced accuracy.They are playing critical roles in various domains.Data sampling in distributed environment calls for lower data transmission cost,higher efficiency and scalability with high sample representativeness.To achieve this goal,we propose DSS,a scalable and efficient stratified sampling algorithm for largescale datasets in distributed environment.DSS adopts the message transmission scheme from Spark,a distributed computing platform,to figure out the final sampling frequency of the intermediate sampling results from each node,and thus guarantees the sampling representativeness in distributed environment.Meanwhile,DSS computes the required sampling frequency of each node by taking the proportion of individuals satisfying stratum constraint inside each data partition into account,and thus conducts the sampling procedure in a distributed manner.Also,DSS reduces data transfer cost by transmitting the metadata instead of the data records.The experiment results show that DSS significantly reduces the amount of data transferred over cluster networks(almost 0.05% of state-of-the-art stratified sampling algorithm Spark-SQE)with a high sample representativity.Regarding the computational cost,DSS reaches a speedup of 65% comparing to Spark-SQE.Also,the runtime of DSS under a growing size of workloads can achieve a linear improvement,indicating the high scalability of our proposed method.In the field of image object detection,because generated region proposals have large performance gap between different loss types,the training of object detectors is experiencing the data imbalance problem.To overcome this problem,we propose SOHEM,the Stratified Online Hard Example Mining algorithm for training region-based object detectors with high efficiency and accuracy.In the process of hard example mining,S-OHEM adopts the stratified sampling technique and focuses on the influence of different loss type throughout the training process to enhance the localization accuracy.S-OHEM pays more attention to the localization loss during hard example mining and adds region proposals with high localization loss in the active training set.Through systematic experiments and analysis,we find that S-OHEM can converge to lower training loss than the standard Online Hard Example Mining(OHEM)in training stage.In the test stage,S-OHEM can generate detection bounding boxes with higher localization accuracy satisfying higher IoU(Intersection over Union)thresholds.S-OHEM yields an average precision(AP)improvement of 0.5% on rigid categories of PASCAL VOC 2007 for both the IoU threshold of 0.6 and 0.7.For KITTI 2012,both results of the same metric are 1.6%.Regarding the mean average precision(mAP),a relative increase of 0.3% and 0.5%(1% and 0.5%)is observed for VOC07(KITTI12)using the same set of IoU threshold.Also,S-OHEM focuses on enhancing the localization accuracy only from the data perspective,and is thus easy to integrate with existing region-based detectors.By acting together with post-recognition level regressors,S-OHEM is capable of obtaining higher object detection accuracy.
Keywords/Search Tags:Stratified Sampling, Distributed Computing, Deep Learning, Image Object Detection, Hard Example Mining
PDF Full Text Request
Related items