Font Size: a A A

Research On Several Technologies Of Auditing Big Data Platform

Posted on:2021-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:D P LuFull Text:PDF
GTID:2518306464983009Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the in-depth development of big data technology,it has become a consensus to use big data technology to improve the modern management level of governments and enterprises.This article will take the construction of the audit big data platform as a background,to solve the problems of large file upload,structured data storage and analysis,and large text similarity calculation in audit business.The main work of this paper as the followings:For the collection of auditing electronic file,an HTTP large file upload solution that supports file transfers up to terabytes is designed.This technology is based on browser transmission and file fragmentation technology,supports the breakpoint resume and large file integrity check.In particular,it supports multiple users to upload files concurrently.For the storage and analysis of audit big data sets,after comparing a variety of distributed storage and computing solutions,it is considered that HAWQ is superior to distributed SQL computing engines such as Hive and Impala in multiple performance indicators,and it is proposed to use HAWQ as an audit relational data storage and computing solution.HAWQ supports internal and external table storage and analysis of large data sets.StandardizedSQL and rich functions are conducive to the migration of audit models,and at the same time reduce the learning cost of audit business personnel using big data technology.Perfect rights management is conducive to maintaining data security.HAWQ also supports use machine learning to explore audit dataFor the calculation of complex text similarity,a multi-dimension fast check and comparison technology is designed.This technology completes the word segmentation and part-of-speech tagging through the StandardTokenizer,and the keywords are extracted by the TextRank algorithm.The technology consists of search word extraction,search logic design and multi-dimensional index calculation,effectively solves the technical difficulty of making similarity judgments between long texts,short text and long text,provide new ideas for text similarity calculation.In the end,three experiments are designed to verify the feasibility of related technical solutions.Experiments show that the designed file transfer technology can upload large files and ensure file consistency;HAWQ has fast and stable structured data analysis capabilities,which can complete the calculation of auditing large data sets;the designed multi-dimension check and comparison technology has high accuracy,It can check and compare complex text.Related technologies have reference value in the construction of auditing big data platforms.
Keywords/Search Tags:Audit, Big data platform, HAWQ, Text similarity
PDF Full Text Request
Related items