Font Size: a A A

Research On Vector-Space-Model Based Event Sequence Classification And Its Application

Posted on:2017-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y J FanFull Text:PDF
GTID:2348330512962249Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Classification analysis is an important mining task in the field of data mining. It aims to learn models from training samples, and then uses them to predict the category of unknown sample. Traditional classification method mainly focuses on numeric data, and has been wildly used in pattern recognition and machine vision.With the development of technology, there are a large number of event sequences in many applications, such as instruction sequence in malware detection, document data in the domain of information retrieval, etc. It is of great significance to classify these sequence data. However, due to the inherent uniqueness of sequence, i.e. the element in sequence is non-numeric and well-organized, many classification algorithms which is well-suited for numeric data are unable to classify event sequence. Therefore, event sequence classification has became a challenging task in the field of data mining.In this dissertation, we analyze the advantages and limitations in existing event sequence classification methods, then focus on the vector space based one. This method first represents sequence as vector in a feature space through feature selection strategy, then utilizes traditional methods to conduct classification. However, it is usually subject to the following two problems:(1) the extracted feature is high-dimension and unable to reflect the complex characteristic of sequence; (2) the traditional vector space model overlooks the ordering relationship among attributes. In order to overcome these drawbacks, we improve it through the following two aspects:the feature extraction algorithm and the vector space model. Therefore, the main work and contributions of this dissertation can be summarized as follows:(1) We propose a new sequence feature extraction algorithm to mine discriminative sequential patterns in event sequence. The extracted patterns are strong relevant to the sequence category and able to embody the potential differences among different sequence categories.(2) We apply the event sequence mining to the field of information security, then construct a new malware classification system. The system utilizes the proposed feature extraction algorithm to mine malicious sequential patterns, and then applies a new nearest neighbor classifier to conduct malware detection and classification. The experiment results show that this application greatly improve the detection rate and the classification accuracy.(3) We propose the sequential constrained vector space model which uses a constraint matrix to depict the ordering relationship among the attributes of traditional vector space model. Then, a new sequence similarity measure is proposed based on this model. The experiment results on document dataset demonstrate the effectiveness of this modeling method and the new measure.
Keywords/Search Tags:event sequence, classification, vector space model, feature extraction, malware detection, sequence similarity measure
PDF Full Text Request
Related items