Font Size: a A A

Research On Name Recognition Technology Of Bidding Project Based On Deep Learning

Posted on:2021-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:K ZhangFull Text:PDF
GTID:2428330614961609Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The Internet provides a large number of data sources,and most of them exist in the form of text.How to make full use of these text data faces many challenges.Tender announcements are such data,which are widely available on government procurement websites at all levels in China.A tender announcement usually consists of a title and a text.Although the title describes the name of the project to be tendered,it also contains many other auxiliary contents such as the project unit and the project location.Therefore,in the face of tens of thousands of new data records every day,identifying and extracting more concise project names is helpful for improving the ability of data query and data analysis.Deep learning is an effective method for processing text data.For the diversity of the titles of bidding announcements,we chose a Transformer-based model for feature extraction,and proposed a Transformer-att-label model with joint labeling.In the feature extraction,a traditional attention mechanism is used to combine multiple attention heads.This will make the model able to give more attention to important information and improve the model effect.The Transformer calculates the probability that a word belongs to each label,combined with label embedding,the first N probabilities of possible word labels are added as the weight of the label vector,and used as the semantic vector of the predicted label of the word.Calculate the distance between the semantic vector and the label vector,and select the closest label output.Further,in view of the problem of polysemy and lack of training data in the title of the bid announcement,this paper proposes the Bert-Bi Lstm-label-CRF model.Bert uses MASK technology to better determine semantics based on contextual context based on the two-way Transformer model,And the model is pre-trained based on huge data,which can achieve better results when there is less training data.We use the Bert model to train word vectors,and extract features based on the characteristics of the NER task plus the Bi Lstm model,and finally use the method of label joint labeling to do sequence labeling under the restrictions of CRF to improve the effect of project name recognition.We conducted experiments on the proposed model on the announcement title data set of the Chinese bidding website,and compared with the recognition effects of other mainstream models to verify the effectiveness of the method.
Keywords/Search Tags:Transformer, label, Bert, BiLstm, CRF
PDF Full Text Request
Related items