The Cache Optimization Strategy Based On Spark And The Application Of CNN In The Recognition Of Train Fault Image

Posted on:2021-04-30

Degree:Master

Type:Thesis

Country:China

Candidate:T Y Chen

Full Text:PDF

GTID:2392330602493693

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development of big data technology,the upsurge of big data technology research has set off all over the world.The TFDS is used to detect the status of freight train components.It is difficult to process the massive data generated by TFDS using traditional technologies.It is necessary to use machine learning techniques to extract valuable data Information.The iterative computing tasks consume a large amount of memory space.When the memory space required for the iterative computing tasks is insufficient,the big data platform needs to replace the existing intermediate results in memory.At present,the widely used big data platforms including Spark,etc.By default,Spark uses the Least Recently Used(LRU)to handle the train fault image identification task,so the memory utilization is not high,resulting in the low efficiency of the train fault image recognition task.Based on the above background,this thesis studies the cache replacement strategy of the big data platform,and proposes a cache replacement strategy for the RDD calculation cost in the Spark platform.The platform of the cache strategy is optimized to shorten the training time of the train fault recognition model.The main work of this thesis is as follows:First,this thesis optimizes and proposes a cache weight replacement(CWS)algorithm based on the Resilient Distributed Data Set(RDD)weight value.Train image data is huge.The Spark's default LRU cache replacement strategy will frequently expel RDD partitions.The CWS algorithm optimizes the selection phase and fully considers the number of historical visits and calculation costs during the replacement phase.The test was performed using the public data set provided by Stanford University.The experimental results show that the CWS algorithm consumes less memory to process smaller data under sufficient memory conditions than other algorithms.and costs less time to process data under limited memory conditions than other algorithms.Secondly,this thesis uses the Convolutional Neural Network(CNN)method to implement train fault image recognition.By using Tensor Flow machine learning computing library design model,a multi-class freight train(MFT)fault image recognition model is proposed.Meantime,Tensor Flow On Spark technology is used to optimize Tensor Flow's resource management and improve its task scheduling strategy.The thesis takes the Zhuzhou depot freight train fault image recognition as an example for experimental testing.The experimental results show that the CWS algorithm can shorten the training time of the MFT model and improve the resource management and task scheduling performance of the Spark platform.The MFT model can effectively identify the fault of the freight train and facilitate the maintenance of equipment failures.

Keywords/Search Tags:

Big data, Spark, Fault identification, Image Identification, CNN

PDF Full Text Request

Related items

1	TSP Image Identification And Numerical Simulation Of Hidden Faults
2	Research On The Identification Of Lightning Faults Based On Measured Data
3	Identification Of Typical Traffic State At Signalized Intersection Approach Based On Vehicle Identification Data
4	Research On Power Equipment Identification And Fault Diagnosis Based On Machine Vision
5	Research On Power Network Topology Identification Method Based On Big Data
6	Research On Fine-grained Vehicle Identification And Re-identification Methods Based On UAV Remote Sensing Images
7	Research On Fault Diagnosis Model Based On Spark And Data Sharing Among Cloud Platforms
8	Research On The Multivariate System Closed-loop Identification Based On Data-driven
9	Short-circuit Fault Identification And Distance Measurement Of AT Traction Network Based On Deep Learning
10	Study On Condition Monitoring Data Quality Improvement And Fault Identification For Power Transformer