Font Size: a A A

Research On Large-scale Handwriting Data Analysis Platform Based On Cloud Computing Architecture And Its Application

Posted on:2018-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:D G LiFull Text:PDF
GTID:2348330533966318Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
In recent years,cloud computing has revolutionized the way of information processing in technology industry and has rooted in all areas of our life and work.With abstract computation,network and storage resources,cloud computing integrates scattered resources into a large-scale cluster,which enables the unified scheduling across the clusters and provides on-demand services.Moreover,cloud computing provides public unified interfaces which greatly simplifies manual operations and interference and improves the utilization of resources.Furthermore,services in the cloud are inherently highly available.With the growing popularity of smartphones and the improvement of man-machine interaction,the Chinese character handwriting input method has become one of the mainstream input methods.Meanwhile,a vast number of users produce a large-scale handwriting Chinese character data during their daily use.There might be a large number of redundant samples among these data,which means writing styles of different writers may be alike for the Chinese character recognition engine.Hence,finding different writing styles among large-scale data is a research hotspot.Besides,as the recognition engine might misclassify samples occasionally,finding the mislabeled samples quickly is also a challenge in this area.In this paper,the analysis of large-scale handwriting data based on a cloud computing platform was conducted.Based on the HDFS cloud storage platform and Spark distributed computing platform,we proposed a method to spot different writing styles and mislabeled samples quickly among large-scale Chinese handwriting data.The main work of this paper can be divided into four parts:1.We built a platform for quick processing and analysis of large-scale handwriting data based on HDFS and Spark frameworks.The platform we built not only solves the challenges of storing and analyzing large-scale handwriting data,but also provides a necessary technical basis for the development of handwritten Chinese character recognition.2.We proposed a method to investigate different handwriting styles among large-scale handwritten Chinese character samples.We evaluated the clustering performance of different features by visualizing the similarity of the same Chinese character's handwriting samples.We compared different features' discrimination of writing styles.3.We proposed a method for quick data cleaning among large-scale handwriting Chinese character samples.We evaluated the clustering performance of different features by comparing ability to spot mislabeled samples.4.We evaluated the performance of Spark platform in our use case and advised some parameter settings under this use case.
Keywords/Search Tags:Cloud Computing, Spark, Chinese handwriting, Big Data, HDFS
PDF Full Text Request
Related items