Developing a Cybersecurity Text Corpus and its Application for Augmenting Semantic Text Similarity

Posted on:2015-06-29

Degree:M.S

Type:Thesis

University:University of Maryland, Baltimore County

Candidate:Chavan, Manish Padmakar

Full Text:PDF

GTID:2478390017492757

Subject:Computer Science

Abstract/Summary:

PDF Full Text Request

The growing use of cyber-services automatically impart great importance to cybersecurity. The Internet is a primary source of information regarding software flaws, vulnerabilities, cyber-attacks and exploits. This information is available through vulnerability databases, news articles, security bulletins and blogs. Variety of applications and security systems like Intrusion Detection Systems (IDS), Intrusion Prevention System (IPS), etc. can take advantage of this information for consolidating their infrastructure. The lack of availability of ready text corpus of high quality security information from various sources makes it difficult for these applications to use this information. To overcome this problem our work focuses on building a multi-genre corpus of security text using information retrieved from multiple internet based sources; National Vulnerabilities Database, Wikipedia articles, security blogs, security bulletins and scholarly papers. The system builds a text classifier from the initial high quality data which is used to classify and accommodate new data from these sources into the corpus.;This corpus can be used by variety of applications like IDS or IPS, in variety of ways like assertion into knowledge base or extraction of named entities. Our work explores one of the applications of generating the semantic text similarity model for cybersecurity text. We use the multi-genre cybersecurity text corpus for creating the word co-occurrence model. This model can extract the synonymity between the different security terms. For example, the words ' virus' and 'malware' that have same context are scored for their degree of similarity. The word co-occurrence model is then extended to generate a semantic text similarity model.The text similarity model extracts the semantic text similarity between different security texts like titles of the papers, vulnerability descriptions, blog paragraphs, etc. The system also develops a combined text similarity model from cybersecurity similarity model and generic text similarity model. This model can be used in document mining for matching security text, clustering documents describing similar vulnerabilities and so on.

Keywords/Search Tags:

Security, Text, Model, Information

PDF Full Text Request

Related items

1	Several New Ideas On Information Security And Its Model And Evaluation
2	Study On The Content-Based Network Security Monitoring Model And Its Key Technology
3	Researches On Models And Algorithms Of Text Information Extraction
4	Study Of Information Security Event Text Classification Method
5	Based Text Filtering Isolation Technology
6	Research On Novel Text Information Hiding Algorithm
7	Researches And Implements Of Text Filtering For Physical GAP
8	Information Filtering Systems Based On Web Text Content And Design,
9	Study On Similarity-based Text Clustering Algorithm And It's Application
10	The Design And Implementation Of Automatic Categorization System Of Public Security Information Based On SVM