Contributions a la detection des anomalies et au developpement des systemes de recommandation

Posted on:2013-11-21

Degree:Ph.D

Type:Thesis

University:Universite de Sherbrooke (Canada)

Candidate:Shu, Wu

Full Text:PDF

GTID:2458390008969226

Subject:Computer Science

Abstract/Summary:

Data mining, also called Knowledge Discovery in Databases, is a relatively young and interdisciplinary research field of computer science. It is the process of analyzing large-scale datasets, extracting knowledge, and then transforming this knowledge into a human-understandable structure for further use. Outlier detection and recommendation systems are two important tasks in data mining. Outlier detection refers to detecting observations in a given dataset that do not conform to normal observations, while recommendation systems try to predict user's preference towards items from historic data of purchase and other related socio-economic data of the users. The main focus of this thesis is to study two key issues in outlier detection and recommendation systems: outlier detection from (or in) large-scale categorical datasets and recommendation systems from highly-skewed rating datasets.;Previous research on recommendation systems has neglected one significant rating scenario, which broadly exists in many real Web applications, such as e-commerce (e.g. Amazon, Taobao) and content provider websites (e.g. Youtube). The rating datasets collected from these websites have different characteristics from the traditional movie and music rating datasets. Their ratings distributions are with high skewness. After examining the properties of this kind of rating datasets, we propose a new framework for estimating rating and quantitative high-order preference for skewed rating datasets. This framework allows to generate novel and more effective matrix factorization and neighborhood models. Experimental results on typical highly-skewed datasets show that new models created under this framework can generate better performance than the conventional methods on the skewed rating datasets for not only rating prediction but also for Top-N recommendation.;Detecting outliers in large-scale categorical datasets is a very important and open significant topic in outlier detection. Existing methods in this area suffer from low effectiveness and low efficiency due to high dimensionality and large size of the datasets, high-complexity of statistical tests or inefficient proximity-based measures. In this thesis, we provide a formal definition of outlier in the categorical datasets, and design two effective and efficient algorithms with only one parameter for the task of outlier detection in large-scale categorical datasets.

Keywords/Search Tags:

Detection, Datasets, Recommendation systems

Related items

1	Research On Shared Accommodations Recommendation In Sparse Datasets
2	Research On Some Key Problems Of Web-based Recommendation Systems
3	Research And Development Of Personalized Itinerary Recommendation Based On Uncertain Datasets
4	Research On Algorithms Of Recommendation In Personalized Recommendation Systems
5	A Study On Dynamic Recommendation Algorithms In Recommendation Systems
6	Analysis of precision agriculture datasets for on-farm research
7	Exploring similarities in high-dimensional datasets
8	Discovering and ranking outliers in very large datasets
9	Fractal Analysis Of Datasets Using Distributed Computing
10	Research On Recommendation Attack Detection Algorithm For Collaborative Filtering Recommender Systems