Font Size: a A A

Research On Chinese Contract Text Classfication Based On BERT And Attention Mechanism

Posted on:2024-09-07Degree:MasterType:Thesis
Country:ChinaCandidate:X HanFull Text:PDF
GTID:2568307055477614Subject:Electronic Information (Field: Communication Engineering (including broadband network, mobile communication, etc.)) (Professional Degree)
Abstract/Summary:PDF Full Text Request
This paper systematically summarizes Chinese the research status of text classification algorithms at home and abroad,and expounds the principles of commonly used algorithms in text classification tasks.By analyzing the contract text,this paper constructs a Chinese contract text classification dataset,and proposes a classification model for the contract title and contract content according to the characteristics of the contract text.The main research contents of this paper are as follows:Firstly,aiming at the lack of Chinese contract text classification dataset in the current Chinese contract text classification task,this paper constructs Contract-Data,a dataset for Chinese contract text classification,which divides the contract text into contract title and contract content,and the contract title and contract content correspond one-to-one.This paper analyzes,summarizes and summarizes the characteristics of Chinese contract texts,preprocesses the contract data based on the real contract and the crawled blank contract,selects 8 types of contract texts for preprocessing according to the contract categories and the number of real contracts stipulated in the Civil Code of the People’s Republic of China,and constructs the Contract-Data of the Chinese contract text classification dataset.Secondly,aiming at the task of Chinese contract text classification,a classification model combining BERT,Bi LSTM,attention mechanism and maximum pooling is proposed to classify contract titles.The model study uses BERT word embedding to generate a word vector of the contract title as input;Use Bi LSTM to memorize both forward and backward contract title text features;The attention mechanism was used to give different weights to the contract text features of Bi LSTM memory,and increase the memory ability of the model for key features.Use maximum pooling to reduce model parameters and speed up model running time;Use Softmax to output the contract title classification results.The effectiveness of the model is verified by comparing experiments with other models,ablation experiments,and comparison experiments with different word vectors.Finally,according to some problems arising from the contract title classification task,the contract content is analyzed,and the BMH-BNLSTM classification model for the contract content is proposed.The model uses BERT-ms word embedding to generate word vector representation of contract content text.Aiming at the problem of excessive word count of contract content,the Batch Normalization function is introduced to improve Bi LSTM and improve the model’s ability to extract contract content.Use hierarchical attention mechanisms to increase the weight of key text features;Output the contract content classification results through Softmax.The effectiveness of the model is verified by comparing experiments with contract title models and other models,comparing experiments with different pre-trained models and comparing experiments with different data.This paper focuses on the classification of Chinese contract text,analyzes the characteristics of contract text,distinguishes contract text into contract title text and contract content text,and constructs contract title classification model and contract content classification model respectively.The purpose of this paper is to better realize the contract text classification,make the contract classification have higher application value,lay the foundation for follow-up tasks such as contract text review and save time.
Keywords/Search Tags:Contract Text, text classification, BERT, Attention mechanism, LSTM
PDF Full Text Request
Related items