| With the rapid development of information technology,the contract signing tends to be diversified in the information age of enterprises.At the same time,the problem of massive contract management is becoming increasingly serious compared with the past,contract management mainly relies on manual reading and comprehension,which leads to the time-consuming and labor-intensive contract classification and the lack of unified classification.In order to improve the office efficiency of enterprises,it is urgent to automate the classification of contracts Based on the above problems,this paper studies the algorithm of unstructured text in contract domain by using Chinese Text Classification technology,and puts forward four kinds of Chinese Text Classification for contract domain.Aiming at the particularity of contract text,it focuses on the Chinese Text Classification for contract text based on BERT and hierarchical word embedding.Finally,it puts forward a Chinese Text Classification system for contract domain based on B/S architecture The research content of this paper is divided into the following three parts:1.According to the particularity of contract classification,three kinds of contract-oriented Chinese Text Classification are proposed In the contract classification,we can judge the contract category based on the contract title and the contract text.This paper designs two classifications based on the contract title: one is to make keyword matching rules for the contract title to judge the contract category;The other is to learn the model of the contract title to identify the contract category;Design a classification based on contract text,learn the model of contract text,improve the generalization ability of model by adding a large number of corpus training,ensure the accuracy of model classification and accurately identify contract categories In order to explore the accuracy of the above classification,this paper proposes three categories of Chinese Text Classification based on rule matching and contract title: Chinese Text Classification based on Bi LSTM and contract title and Chinese Text Classification based on Bi LSTM and contract text.2.Aiming at the problem that the contract text is too large to fully recognize the semantic information between words,a Chinese Text Classification method based on BERT and hierarchical word embedding is proposed By pre-training large-scale contract word vectors,using contract word vectors as dictionaries,semantic representation of words matched by input sentences is carried out,and then the representation words are weighted and then spliced with multi-level attention mechanisms to give different weights to keywords in different contexts to enhance sequential learning performance at word and sentence levels Comparative experiments show that the model proposed in this paper has a good classification effect.3.A Chinese text categorization system for contract domain based on B/S architecture is constructed With the proposed contract classification model as the core,the core functions such as contract title classification and contract text classification are realized through the integrated framework Django The overall architecture of the system is divided into data layer,model layer,Business layer,display layer,data layer,contract data support for model layer,training model,Business layer,specific function,display layer and visual interface display for users. |