Font Size: a A A

Design And Development Of A Text Classification System For Electric Power Audit Issues Based On Semantic Matching

Posted on:2024-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:J Y FengFull Text:PDF
GTID:2542307106989999Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Auditing is an independent evaluation of a company’s financial condition and operational activities.Through auditing,problems and risks can be identified and improvement suggestions made,which helps companies better manage risks and increase business efficiency and effectiveness,thereby enhancing their competitiveness and sustainability.The internal audit targets of power grid enterprises include financial statements,internal control,business operations,and risk management,among others.Auditors are required to manually examine these factors to identify and record problems and anomalies.Based on the nature,scope,causes,and differences in the solutions to the problems,auditors summarize and classify them,identify common reasons behind different classes of problems,and provide targeted recommendations and improvement measures,thus improving the quality and efficiency of audit work.The traditional method of power audit problem classification relies mainly on the personal experience and ability of auditors.The subjectivity of individual differences often leads to problems such as irregular classification and inconsistent results,which affects the efficiency and accuracy of audit work.To solve these problems,it is necessary to have a standardized and unified auditing problem classification tag library with standard reference meaning.Based on this tag library,text classification technology can be used to effectively and uniformly qualify and classify the discovered problems in power audit.Both traditional machine learning text classification algorithms and current deep learning or pre-trained language model text classification algorithms require a certain number of training samples.However,due to the specialization and complexity of power audit problem texts,it requires professional auditors to annotate them,and the cost of marking a large number of data is high.Moreover,because audit problem texts involve sensitive information of companies,many companies and organizations are not willing to share this data,making it difficult to obtain comprehensive and adequate numbers of audit problem text data.To solve the problem of insufficient sample training,this thesis designs a power audit problem text classification model based on the powerful semantic representation ability of pre-trained language models and semantic matching,and builds a classification system based on the classification model.On one hand,the classification system helps auditors to standardize and organize historical power audit problem texts and to establish a comprehensive and rich data set of power audit problem texts,laying the foundation for training high-precision classification models.On the other hand,it standardizes the classification of newly added power audit problem texts,reducing the subjectivity and inconsistency of power audit problem classification.In this thesis,we first design the weighted cross-matching and the selection ROM Chinese semantic related model to address the problem of insufficient samples for training classification models and to improve the accuracy of semantic matching,which are used to organize historical power audit problem texts and classify newly-added power audit problem texts.The weighted cross-matching model cross-matches the hierarchical tags of auditors’ subjective qualitative tags of historical data with the hierarchical tags of standard classification tags in the library,and assigns higher weights to deep-level matching results,thus reducing the impact of inconsistent semantics between the two types of tags.Using the ROM model to semantically match short text classification tags and long text audit problems,its advantages in training on the Baidu search collection and the consideration of word weight mask strategies are utilized to reduce the impact of length and semantic differences on semantic matching.Secondly,based on the above model,this thesis adopts the Vue and Spring Boot frameworks to design and implement a text classification system for power audit problems.Auditors can upload and manage power audit problem texts in the classification system,query the corresponding standard classification labels for the problem texts,and when confirming the classification labels,auditors can see the power audit problem texts under the standard classification labels to assist in judgment.In cases where the system classifies the text incorrectly,auditors can provide feedback on the problem text and submit it to the system for confirmation of the feedback classification labels by the administrator.At the same time,the system establishes some data statistical indicators and visualizes them to better understand the number and category distribution of the power audit problem text dataset and the system’s classification performance.Finally,we tested the classification model designed in this thesis on the summary table of audit problems of a certain power company of the State Grid,and the experiment verified the effectiveness of the two models and their good accuracy.The functional test of the entire power audit problem text classification system showed that the system meets the requirements for practical use.
Keywords/Search Tags:power audit, audit problem text, text classification, semantic matching
PDF Full Text Request
Related items