In recent years,the government has continuously formulated various new policies to support small and medium-sized enterprises,which has injected new vitality into various industries,thus promoting the development of the national economy.Tax payment is the responsibility and obligation of natural persons,legal persons and individual entrepreneurs.Taxpayers should follow the tax law to pay taxes reasonably and legally.However,with the development of the society,the local governments may change different new policies,which make the types and details of tax payment become numerous and complicated,and the taxpayers cannot be informed in time.Therefore,it is an urgent practical problem for tax administrations at all levels to provide taxpayers with effective and convenient consulting services.Traditional telephone or on-site consultation requires a lot of manual work,which consumes human and financial resources and increases the cost of communication between the two parties and reduces the service efficiency.Some existing tax systems allow users to leave messages,but taxpayers do not get timely responses to their questions.With the development of technologies related to natural language processing,intelligent question and answer systems have been integrated into various industries and have achieved good results.The current FAQ-based single-round question-and-answer system usually adopts a question matching model to first search in the corpus,after which a candidate result set is obtained and finally the answers are returned to the user after sorting.This traditional retrieval model has disadvantages,on the one hand,if the number and variety of question pairs in the knowledge base is large,it will challenge the computational speed and matching accuracy of the existing model.On the other hand,the current retrieval model cannot control the semantic information contained in the sentences well,which may miss matching to the best candidate answer.This thesis proposes a two-stage model to improve on the above two problems.(1)To address the problem of performance degradation of retrieval models after knowledge base expansion,this thesis adds a one-stage classification task before retrieving answers.This is done by using a hierarchical attention model,splitting the tax corpus into business scenarios,and reducing the sample space of the retrieval stage.In this thesis,we add discriminators to the original model to improve the classification accuracy of the model.(2)To address the problem that the semantic relationship between sentences cannot be fully recognized in the retrieval process,this thesis fine-tunes the ERNIE3.0 pre-training model using the tax dataset in the two-stage text matching task and proposes a scenario-level multitask training framework.Different encoder layers of the model are trained for different scenarios,which greatly saves space resources and improves the inference speed of the model when deployed.In the sentence vector matching stage,the WRD(Word Rotator’s Distance)algorithm is used to replace the cosine similarity algorithm,resulting in a significant improvement.The final stage of this thesis is a two-stage knowledge distillation of the finetuned model,which shrinks the number of parameters of the model with little loss of accuracy and makes the model easier to deploy.(3)Experimental validation of the constructed two-stage model was conducted in each stage separately.The experiments use tax datasets and open datasets.Compared with the traditional FAQ-based single-round question-and-answer model,the two-stage model designed in this thesis proves its effectiveness by improving the accuracy of model computation while reducing the speed of inference.Finally,this thesis analyzes the system architecture and business process,and implements the tax question and answer system. |