Font Size: a A A

Design And Implementation Of Frontier Dynamic Big Data Analysis System For Novel High-speed Railway

Posted on:2023-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:Q LvFull Text:PDF
GTID:2532306845996109Subject:Software engineering
Abstract/Summary:PDF Full Text Request
At present,although high-speed railway technology has been ranked in the forefront of the world,there are also some core technologies,such as basic materials,parts and components that need to be broken through.In order to accelerate the pace of scientific and technological innovation in the field of high-speed railway,it is necessary to strengthen the research on frontier science and technology,so as to further explore the research hotspots and difficult problems in the field.Based on this background,this paper designed a big data analysis system which integrates functions such as dynamic data collection,multi-dimensional data analysis and dynamic mining research hotspots and development trends.It can provide the latest and real-time information decision support for the national railway management department,railway operation department,railway core department,intelligent manufacturing department and relevant researchers.It is of great significance for promoting the frontier scientific research in the field of high-speed railway.In the process of realizing the system,the main work done in this paper is as follows:(1)Aiming at the problem that the collection of scientific and technological resources cannot be efficiently completed by artificial means,a dynamic data collection scheme combining the Scrapy crawler framework and the timing module is proposed.In this solution,the scrapy-redis configuration is enabled to filter the collected pages,and the Redis cache is used to complete the filtering of duplicate data.(2)Aiming at the problem that Chinese word segmentation tools cannot effectively identify professional words with more than three vocabularies,a professional word recognition algorithm based on improved adjacency entropy is proposed.Compared with the MBN-Gram new word recognition algorithm,this algorithm can more quickly identify professional words with more than three vocabularies.(3)Aiming at the problem that the traditional LDA topic model based on the word frequency statistics method,which will cause most professional words to be overwhelmed by high-frequency words,resulting in low topic classification accuracy,a topic modeling method called Key-LDA based on the LDA is proposed.In this method,keywords that are strongly related to text semantics extracted by the Key Bert algorithm are used as a text corpus of the LDA topic model,which can effectively improve the accuracy and efficiency of classification.(4)Aiming at the problem of incomprehensible and meaningless results in the description of topics by topic model,a fine-grained text clustering framework based on Key-LDA topic model is proposed for mining research hotspots in the field of highspeed railway.The framework first applies text preprocessing technology to convert text into vector representation;then constructs topic tags based on Key-LDA topic model,and uses topic strength indicators to quantify research hot topics.Finally,the K-means algorithm is used to cluster the literature data under the topic to achieve a fine-grained description of the topic.(5)For the application scenario of real-time update of visual large-screen data,the Web Socket transmission protocol is used to establish full-duplex communication between the client and the server to realize the real-time data push function.The server side dynamically monitors the database table through multi-threading technology,and pushes the latest data to the client in time once the database table changes.This system realizes dynamic monitoring of cutting-edge science and technology in the field of new high-speed rail,and has strong reference and reference significance for cutting-edge scientific research on high-speed rail in China.
Keywords/Search Tags:Frontier Scientific Research, Web Crawler, Text Mining, LDA Topic Model
PDF Full Text Request
Related items