Research On Named Entity Recognition Method For Network Security Domain

Posted on:2024-03-31

Degree:Master

Type:Thesis

Country:China

Candidate:D L Li

Full Text:PDF

GTID:2558307097471554

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Named entity recognition is an important part of knowledge extraction and is the first task of building knowledge graphs.How to quickly and accurately identify and extract useful information from a large amount of text is a hot issue of academic research in recent years.And with the advent of the era of big data,network intrusion,virus infection and other network attacks have become more and more frequent,and network attacks have seriously affected the security of computer usage.Without network security,there is no national security.In order to ensure the security of cyberspace,the state monitors the network in real time through various technologies,which generates a large amount of network security data.In this paper,we study the application of neural network model to the entity recognition work of cyber security vulnerability events based on deep learning technology,by word embedding then encoding and finally decoding using conditional random fields to finally realize the entity recognition work.For the study of cybersecurity named entity recognition,a named entity recognition neural network model incorporating multiple sources of information of Chinese characters is proposed,and a cybersecurity entity recognition corpus is constructed for the problem of lacking a corpus of named entity recognition in the domain.The details of the study are as follows:(1)Constructing a corpus of entity identification in cybersecurity domain.To address the lack of public cybersecurity entity identification corpus in the cybersecurity field,the information of national security vulnerability database is collected as the text data source of the corpus to ensure the real validity of the data source.The collected data includes vulnerability information of operating system module,application module,database module,web application module,network device module and other modules in the past five years to ensure the comprehensiveness and timeliness of the corpus.The corpus goes through two stages: the pre-labeling stage and the final labeling stage.Experts in the field of cybersecurity develop annotation rules and specifications,and thus develop annotation tools to train annotators.The final entire cybersecurity corpus contains 400,000 words,which are annotated according to the BIO approach and distributed in the ratio of training set: validation set: test set = 6:2:2.(2)A neural network model for cybersecurity named entity recognition that fuses information from multiple sources of Chinese characters is proposed.To improve the accuracy of the neural network model,the model uses the output of the last layer of the pre-trained model BERT as the original word embedding,and vector splicing and fusion of information such as paraphernalia and word frequencies of text in the corpus to provide enough prior knowledge,further fusion of lexical information while feature extraction is performed in the coding layer,and final decoding is performed by conditional random fields.In order to verify the generalizability of the model,comparison experiments with common neural network models on public domain datasets are conducted,and the model performs well.To demonstrate the effectiveness of the model in the cybersecurity domain,comparison experiments with common models on constructed cybersecurity domain datasets are conducted,and the experimental results of accuracy,recall and F1 values are 0.8649,0.8402 and 0.8523.(3)Designing and implementing a network security entity identification system.We constructed a network security entity recognition system based on the proposed network security named entity recognition neural network model that integrates multi-source information of Chinese characters to improve the accuracy and efficiency of named entity recognition in the field of network security.The whole system is simple and practical,frontend and back-end classification,based on python and HTML and other languages,which can significantly improve the efficiency and accuracy of entity identification in the field of network security.

Keywords/Search Tags:

Named entity recognition, cybersecurity, corpus construction, pre-trained model, word vector fusion

PDF Full Text Request

Related items

1	Research On Chinese Named Entity Recognition Based On XLNet And Word Segmentation Fusion Coding
2	Chinese Named Entity Recognition Based On Pre-trained Language Models
3	Automatic Extraction Of Chinese-English Named Entity Pairs Based On Bilingual Aligned Corpus
4	Research On Named Entity Recognition For Science And Technology Terms Based On Dependent Entity Word Vector
5	The Field Of Music, A Combination Of Rules And Statistical Named Entity Recognition
6	Research On Word-vector-representation-based New Word Discovery And Name Entity Recognition
7	Research On Extraction Of Named Entity Translation Equivalents From Comparable Corpus
8	Automatic Approaches To Develop Large-scale TCM Electronic Medical Record Corpus For Named Entity Recognition Tasks
9	Research On Method And Application Of Named And Terminology Entity Recognition
10	Research On Biomedical Named Entity Recognition Based On Hybrid Model