Font Size: a A A

Mining of uncertain Web log sequences with access history probabilities

Posted on:2011-05-10Degree:M.ScType:Thesis
University:University of Windsor (Canada)Candidate:Kadri, Olalekan HabeebFull Text:PDF
GTID:2448390002454906Subject:Computer Science
Abstract/Summary:
An uncertain data sequence is a sequence of data that exist with some level of doubt or probability. Each data item in the uncertain sequence is represented with a label and probability values, referred to as existential probability, ranging from 0 to 1.;KEYWORDS: Uncertain data mining, frequent sequential patterns, Web log mining, existential probability generation, dirty data mining, Tree-based mining, probabilistic data mining.;Existing algorithms are either unsuitable or inefficient for discovering frequent sequences in uncertain data. This thesis presents mining of uncertain Web sequences with a method that combines access history probabilities from several Web log sessions with features of the PLWAP web sequential miner. The method is Uncertain Position Coded Pre-order Linked Web Access Pattern (U-PLWAP) algorithm for mining frequent sequential patterns in uncertain web logs. While PLWAP only considers a session of weblogs, U-PLWAP takes more sessions of weblogs from which existential probabilities are generated. Experiments show that U-PLWAP is at least 100% faster than U-apriori, and 33% faster than UF-growth. The UF-growth algorithm also fails to take into consideration the order of the items, thereby making U-PLWAP a richer algorithm in terms of the information its result contains.
Keywords/Search Tags:Uncertain, Mining, Web, Sequence, U-PLWAP, Access, Probabilities, Probability
Related items