A Study On Markov-Model-based Sequence Classification

Posted on:2015-08-31

Degree:Master

Type:Thesis

Country:China

Candidate:H H Wu

Full Text:PDF

GTID:2298330467461804

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Classification is a supervised machine learning method, and it has been widely used in many fields, such as risk assessment in the bank, customer categorization, automatic classification of documentation, etc. With the fast development of technology, more and more event sequence data spring up in classification application, which is one kind of non-numerical data. For example, protein sequences, the buying activity logs of customer at the mall, website click-streams of user and so on. The universality of event sequences makes it highly important to classify event sequences rapidly and accurately over model-based methods.The characteristics of event sequences are quite different from the traditional numerical data. The event sequence is made up with discrete symbols, so those distance metric algorithms commonly used in numerical data cannot be applied on it. And a sequence is an ordered list of events, feature extraction followed by conventional classification algorithms will occur information loss problems. Due to the special characteristics of the event sequence, many classification algorithms which perform well on numeric data cannot obtain good result when applied to it.In order to address these problems, some new Markov models are proposed in this thesis, which focuse on the issue of event sequence classification, based on the statistical model of event sequences. And we also implement a distributed Markov model algorithm based on Apache Hadoop for big data. The researches in this dissertation has much theoretical and practical significance.The majority of our contributions can be summarized as follows:1. A new weighted variable length Markov model is proposed, where the probability of subsequence and transition probabilities of sequence elements are combined, to optimize the classification model. And a new similarity pruning strategy executed when building the model is also proposed, which enhances generalization of the model.2. For the practical application of only a small amount of training data, an automatic weighted variable length Markov model based on nominal attribute kernel smoothing method is proposed to obtain optimal estimation sample bias and variance estimation, result in the improvement of classification accuracy on small amount of data.3. To deal with big data, the variable length Markov model is ported to Apache Hadoop distributed platform to get distributed parallel computing, which aims to solve the storage capacity limitation of stand-alone machine for big data and the bottleneck of large-scale computing capacity.

Keywords/Search Tags:

event sequence, classification, Markov model, weighted, distributedcomputing

PDF Full Text Request

Related items

1	Research On Vector-Space-Model Based Event Sequence Classification And Its Application
2	Markov Models And Hidden Markov Model-based Three-dimensional Model Of Classification Research
3	Research On Novel Composite Event Detection Techniques For Markov Chains
4	Research On Event Extraction Algorithm Based On Sequence Labeling Model
5	Video Event Detection Method Research Based On The Hidden Markov Model
6	Dynamic Image Sequence Representation And Classification With Application To Human Motion Analysis
7	More Clues To The Fusion Of Football Video Semantic Analysis And Event Detection
8	Parameter Estimation Of Hidden Markov Model And It's Application In News Classification
9	Real Time Gathering Event Detection Based On Layered Hidden Markov Model
10	Research On Attention-based Model For Sequence Classification