Font Size: a A A

Explore The Construction Of A Natural Language Programming Framework In Auditing

Posted on:2022-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhuFull Text:PDF
GTID:2518306515485634Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the concept of "full audit coverage" proposed in the era of big data,the amount of data that needs to be processed in audit work is increasing,and the types of data are also increasing.So auditors must use a variety of technical means to collect and analyze different types of data,and programming language is one of the most difficult to master.Natural language programming is a research direction of natural language processing.It mainly uses related technologies of natural language processing to extract programming ideas(which may appear)from natural language sequences to reduce the difficulty of programming.Its ultimate goal is to generate complete computer programs that can be directly compiled or interpreted by computers.Integrating natural language programming into audit work is beneficial to reduce the programming difficulty of auditors and improve their work efficiency.Through the review of literatures related to natural language programming,I did not find any papers that transformed natural language sequences into Python programming framework.Therefore,this article examines how to convert the Chinese natural language sequence entered by the auditor into a Python programming framework,and demonstrates the conversion process through an "audit scenario." The specific transformation process is as follows: firstly,the dependency relationships among phrases in natural language sequences are obtained through sentence segmentation,word segmentation,part of speech tagging and dependency parsing.Verbs with object are selected by the dependency relation,and the verb with object is used as the judgment condition of the program step,and the combination of verb and verb object is used to represent the procedure step.Then,the dependency between phrases is used to transform the natural language sequence into a tree structure,and the tree structure is pruned to form a tree containing only verbs.Through the relationship between verbs embodied in the "verb tree",the final execution steps of the program and the execution sequence of the program steps are determined.Finally,the program steps are converted to Python programming frameworks,leaf nodes in the "verb tree" are converted to program comments,and non-leaf nodes are converted to functions(methods).The implementation of natural language programming depends on the programming idea in the natural language sequence.If the natural language sequence without programming idea is transformed into the program frame,it will lead to the generation of wrong program frame.Determining whether programming ideas are present in natural language sequences is a classification problem.In order to improve the implementation of natural language programming,this paper builds a data set based on the "audit data analysis scheme",and manually divides the data set into two categories:whether there is a programming idea or not.Verb phrase and noun phrase are the core of natural language sequence,and the implementation of natural language programming depends on nouns and verbs,so this paper uses nouns and verbs in the data set to generate vectors.A full-connected deep neural network based on back propagation algorithm was built to try classification.The experimental results showed that the classification accuracy was 63% when Re Lu and Sigmoid were used as activation functions,and 65% when Tanh was used as activation functions.In order to improve the accuracy of classification,this paper uses the autoencoder to pre-train the neural network,which makes the classification accuracy of the model reach 79%,and improves the model performance.
Keywords/Search Tags:Natural Language Processing, Natural Language Programming, The Audit Plan, Short Text Classification
PDF Full Text Request
Related items