Font Size: a A A

The Cache Optimization Strategy Based On Spark And The Application Of CNN In The Recognition Of Train Fault Image

Posted on:2021-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:T Y ChenFull Text:PDF
GTID:2392330602493693Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of big data technology,the upsurge of big data technology research has set off all over the world.The TFDS is used to detect the status of freight train components.It is difficult to process the massive data generated by TFDS using traditional technologies.It is necessary to use machine learning techniques to extract valuable data Information.The iterative computing tasks consume a large amount of memory space.When the memory space required for the iterative computing tasks is insufficient,the big data platform needs to replace the existing intermediate results in memory.At present,the widely used big data platforms including Spark,etc.By default,Spark uses the Least Recently Used(LRU)to handle the train fault image identification task,so the memory utilization is not high,resulting in the low efficiency of the train fault image recognition task.Based on the above background,this thesis studies the cache replacement strategy of the big data platform,and proposes a cache replacement strategy for the RDD calculation cost in the Spark platform.The platform of the cache strategy is optimized to shorten the training time of the train fault recognition model.The main work of this thesis is as follows:First,this thesis optimizes and proposes a cache weight replacement(CWS)algorithm based on the Resilient Distributed Data Set(RDD)weight value.Train image data is huge.The Spark's default LRU cache replacement strategy will frequently expel RDD partitions.The CWS algorithm optimizes the selection phase and fully considers the number of historical visits and calculation costs during the replacement phase.The test was performed using the public data set provided by Stanford University.The experimental results show that the CWS algorithm consumes less memory to process smaller data under sufficient memory conditions than other algorithms.and costs less time to process data under limited memory conditions than other algorithms.Secondly,this thesis uses the Convolutional Neural Network(CNN)method to implement train fault image recognition.By using Tensor Flow machine learning computing library design model,a multi-class freight train(MFT)fault image recognition model is proposed.Meantime,Tensor Flow On Spark technology is used to optimize Tensor Flow's resource management and improve its task scheduling strategy.The thesis takes the Zhuzhou depot freight train fault image recognition as an example for experimental testing.The experimental results show that the CWS algorithm can shorten the training time of the MFT model and improve the resource management and task scheduling performance of the Spark platform.The MFT model can effectively identify the fault of the freight train and facilitate the maintenance of equipment failures.
Keywords/Search Tags:Big data, Spark, Fault identification, Image Identification, CNN
PDF Full Text Request
Related items