Font Size: a A A

Research On Customs Commodity Risk Tax Detection Based On Spark Platform

Posted on:2022-10-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZhangFull Text:PDF
GTID:2518306509994349Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Taxation is an important duty of the customs.At present,the main way of tax rate collection is that the application units fill in the declaration information on its own initiative,and then the customs staff will examine and verify it.Therefore,the determination of risk tax often relies on the business experience of the staff,which is inefficient and has different standards.With the continuous expansion of the scale of import and export,the determination of risk tax rate has become a more arduous task for the customs.Text clustering can efficiently process large-scale text data and dig out the internal relations between texts.Automated identification of risk tax rate based on text clustering is of great significance to the maintenance of national tax security.This paper mainly has the following three work:(1)Build a customs risk tax identification system for import and export commodities.For customs declaration text format,we propose TF-IDF and Word Embedding to extract text features and represent text information.On the basis of text representation,through the classification of the same commodity,combined with price,place of origin and other factors of cross-analysis,we set up a customs risk rate identification system for import and export commodities to realize intelligent analysis and judgment of risk rate.The system includes the functions of text information uploading,text representation,text clustering,risk tax rate analysis,risk tax rate information display and query.(2)Design and implement the parallel DBSCAN clustering algorithm.Traditional text clustering algorithms are mostly implemented on a single machine,which has a bottleneck in algorithm performance and cannot efficiently process large-scale text data.Combining with the Spark platform,we implement the DBSCAN text clustering algorithm in parallel.In the cluster environment,the execution time of the algorithm is reduced and the speed of text clustering is improved.(3)Build a distributed customs risk tax identification system for import and export commodities.Through the parallel transformation of the algorithm in the stand-alone system,based on the parallel DBSCAN algorithm,we build the distributed customs import and export commodity risk rate identification system on the Spark platform.Compared with the stand-alone system,the performance and scalability of the distributed system are greatly improved.Through the deployment of actual business scenarios,it is found that the system built in this paper can effectively identify the risk tax.The parallel DBSCAN algorithm can improve the text clustering speed well in the cluster environment,and the performance and scalability of the distributed system is greatly improved compared with the stand-alone system.
Keywords/Search Tags:Risk tax, Text clustering, Parallel DBSCAN, Spark platform, Distributed computation
PDF Full Text Request
Related items