Font Size: a A A

Design And Implementation Of VAT Invoice Application Analysis System Based On Big Data Platform

Posted on:2020-06-25Degree:MasterType:Thesis
Country:ChinaCandidate:W HeFull Text:PDF
GTID:2428330623458360Subject:Engineering
Abstract/Summary:PDF Full Text Request
Value-added tax(VAT)is one of the most important tax categories of the country,which can identify the abnormal behavior of enterprises in the process of invoice issuance and tax declaration.Therefore,the follow-up correlation analysis of VAT invoices becomes the most important task of tax work.However,the traditional data warehouse model,faced with massive invoice data,has the characteristics of long data extraction time,difficult calculation,and long period of tax risk reporting.It is impossible to timely and accurately predict various tax risks.Focusing on the above problems,this paper takes the follow-up analysis of VAT invoice as the research object.The application of storing,cleaning,modeling and calculating massive data wasfully researched in this paper using the massive unstructured data processing capability of big data based on the big data Hadoop system architecture.The main contents and results are as follows:The data mining algorithm under Hadoop cluster is studied.The applicable scenarios and usage methods such as classification algorithms,clustering,association rule mining,parameter estimation,graph classification and user behavior image are studied in depth and related to tax analysis.The corresponding analysis indicators,early warning indicators and user behaviors of the system are matched by algorithms.The multi-dimensional data mining algorithm and calculation ideas based on big data environment are established.By virtue of the characteristics of high concurrency of large data sets and memory computing,the problems of long data extraction time and difficult calculation are effectively solved.The task scheduling,memory allocation and resource management of Hadoop on YARN are studied,including resource allocation,data sharing,cluster collaboration and task monitoring.Hotspots and data skew problems are analyzed and avoided.The key ecological components such as the relational data to Hadoop unstructured data migration tool Sqoop,data stream transmission system Flume,structured number warehouse HIVE and high concurrent memory computing Spark are studied,and the tax-related system has been integrated with requirements and functions,forming a data-based The big data mining method at the data level.It greatly improves the problems of slow processing data and inaccurate data,greatly reduces the time of forming tax risk report,14 days ahead of the traditional data warehouse model.The data structure characteristics,user behavior characteristics and corresponding risk factors of each tax-related business system are studied.The correlation,trend and difference between the data are sorted out,and a data dependency relationship with the number of people registered in the narrative is established.The relationship has formed the upstream and downstream mining ideas with the enterprise flow as the horizontal direction and the mining project with the invoice flow as the vertical,which solved the problems of low data association and low data utilization between heterogeneous systems.
Keywords/Search Tags:Massive Data, Algorithm, VAT, Big Data Mining, Concurrent Calculation
PDF Full Text Request
Related items