Font Size: a A A

Distributed Knowledge Discovery for Diverse Dat

Posted on:2018-08-16Degree:Ph.DType:Dissertation
University:The University of New MexicoCandidate:Hamooni, HosseinFull Text:PDF
GTID:1478390020457593Subject:Computer Science
Abstract/Summary:
In the era of new technologies, computer scientists deal with massive data of size hundreds of terabytes. Smart cities, social networks, health care systems, large sensor networks, etc. are constantly generating new data. It is non-trivial to extract knowledge from big datasets because traditional data mining algorithms run impractically on such big datasets. However, distributed systems have come to aid this problem while introducing new challenges in designing scalable algorithms. The transition from traditional algorithms to the ones that can be run on a distributed platform should be done carefully. Researchers should design the modern distributed algorithms based on the problem domain. The main goal of this dissertation is to demonstrate the importance of domain specific knowledge in developing scalable knowledge discovery algorithms on distributed systems. Data properties such as origin, type, context and size play important roles to achieve speed, efficiency and scalability. In this dissertation, I describe three domain specific knowledge discovery systems on three diverse domains: a distributed algorithm to extract patterns from log messages generated by computers, a distributed algorithm to find abnormal behavior in social media, and a scalable algorithm for matching patterns in streaming time series data. I explain how to exploit the data properties in a distributed knowledge discovery system to achieve scalability and speed. The algorithms achieve horizontal scalability for any data size, and the systems are currently deployed at the University of New Mexico.
Keywords/Search Tags:Data, Knowledge discovery, Distributed, New, Algorithms, Size, Systems
Related items