| With the development of Industrial Internet,in the process of Digital transformation of a household appliance manufacturing enterprise,it is of great significance to efficiently use the log data of the enterprise’s equipment products.At present,there arc many big data logging systems both domestically and internationally,with rich functions,but they do not have universality.The vast majority of them are targeted at data from internet applications.Due to different log characteristics and upload patterns,the home appliance manufacturing enterprise cannot fully adapt.After investigation,it,has been found that the equipment logs of a certain home appliance manufacturing enterprise arc sourced from the company’s home appliance products,including but not limited to smart televisions,smart refrigerators,smart.air conditioners,and smart washing machines.The log content is different,but the format is relatively unified,and is managed by the company’s IoT platform.Each department collects target equipment logs according to business needs and performs subsequent storage,calculation,and display.During the equipment log processing process,The enterprise currently faces the following problems and difficulties:a lack of flexible and targeted data collection methods;Lack of a data storage solution that balances stable storage and real-time queries;Based on the calculation principle of the current computing framework Spark,when there arc too many logs with the same field,it will lead to low computational efficiency.For example,when calculating the logs of a hot selling device model,it is difficult to avoid data skewing;There is no data display function.Currently,the system only relies on developers to perform full backend operations,view and understand data trends and device conditions through the database,which is inefficient and not intuitive.In order to solve the above problems and difficulties,this article has done the following work:(1)Utilize Kafka and Flume to achieve targeted collection of log data.Design and implement a data filter that conforms to the characteristics of the enterprise’ s logs,filtering non target device logs and error logs through field matching.(2)A diversified storage solution that combines stable storage and fast queries has been proposed.The distributed file system HDFS,data warehouse tool Hive and relational database MySQL are used to realize the stable storage of historical data and fast query of common datacomplete the design of database key tables and interface with the data display module PowerBI to visually display the data.(3)Designed and validated an RDD random suffix addition scheme.To solve the problem of data skew in big data log processing,based on the data partitioning principle of Spark RDD,a RDD random suffix addition scheme was designed.Through testing,this scheme can effectively alleviate the data skew problem in the case of a large number of duplicate key keys.(4)Completed the design and implementation of a device log collection and calculation system.According to the system requirements,the architecture design and module division were completed.The entire system was divided into five modules,and the implementation process of each module was elaborated in detail.The entire process of collecting,storing,calculating,and displaying enterprise device logs was completed.This article uses an online testing environment to test the system,and the test results show that the "big data based device log collection and calculation system" can meet the daily data processing needs of enterprises and achieve the expected design goals.This system has been launched and put into use in well-known domestic home appliance manufacturing enterprises.It has effectively achieved the collection,storage,and calculation of enterprise equipment logs in a distributed environment,meeting the daily data development and calculation needs of data developers. |