Font Size: a A A

Pattern and event matching over noisy data sequences

Posted on:2016-03-17Degree:Ph.DType:Dissertation
University:University of Massachusetts LowellCandidate:Li, ZhengFull Text:PDF
GTID:1478390017984344Subject:Computer Science
Abstract/Summary:
Sequence data is prevalent in life. Examples include sensor readings, system logs, search engine histories, biological sequences, ECG signals, and smartphone location traces, etc. Such data is usually uncertain/noisy in nature. The noise can be errors in data value, out-of-ordered arrival, missing data and so on. Pattern matching on these streams has diverse applications such as complex event monitoring, medical condition detection, and business intelligence. We study different pattern matching problems under both independent and correlated error model. Under the independent error model, we assume each data value from the raw data have independent error. We first learn a probabilistic sequence from raw data using statistical methods. Then we study (1) substring matching on probabilistic sequences, (2) windowed subsequence matching on probabilistic sequences. Under the correlated error model, we study (3) extended regular expression matching on correlated probabilistic sequences. It can be shown that a former problem is always a special case of a latter problem in the above list. Hence, algorithms devised for a latter problem are more powerful/general, but there is an inherent tradeoff between power and efficiency. Further, we expand this important line of work to handle order error and missing data, we propose order-approximate and parallel-event matching which is needed in many modern applications. Our approaches are supported by novel theoretical analyses as well as a systematic experimental evaluation using both real-world and synthetic datasets.
Keywords/Search Tags:Data, Sequences, Matching, Pattern
Related items