Research Of Machine Learning Models And Algorithms For Information Filtering And Information Retrieval

Posted on:2008-02-25

Degree:Doctor

Type:Dissertation

Country:China

Candidate:L Zhang

Full Text:PDF

GTID:1118360245992497

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet technologies, information networks play more and more important roles in people's routine work and daily life. To obtain information that people really need from the massive information quickly and efficiently has become a key problem in our information research society. There are two main approaches to solve this problem: information filtering (IF) and information retrieval (IR), which are of important academic interest and valuable applications. The main research work of this thesis is based on statistical machine learning methods, especially the IF/IR models and algorithms. The main contents are as follows:First, a brief introduction to IF/IR is given, including the concept, structure and features as well as their origin and history. As the theory basis of this thesis, several statistical machine learning methods and their functions in IF/IR are also introduced. Second, on the basis of introduction on several popular collaborative filtering approaches, this thesis presents a new probabilistic model for collaborative filtering, named real preference Gaussian mixture model. It has two latent variables corresponding to classes of user and item. Each user or item may be probabilistically clustered to more than one groups. And it also consists of user rating style and item public praise. The new model is more actual and practical than the other methods.Third, another focus of this thesis is on using finite mixture models to cluster large scale document data. A generalized method for unsupervised text clustering is presented. It integrates the mixture model's model selection, feature selection and parameter estimation into a general framework. Moreover, a modified version of"feature significance"is proposed such that the features'revalence to the mixture components is introduced to the mixture model as a set of latent variables and the component-relative features are selected when estimating the model's parameters. As an example of the generalized framework, a multinomial mixture model with feature selection is discussed in detail.Fourth, this thesis use graph-based methods to deal with semi-supervised learning problems. The main idea is to investigate the similarities between data examples by defining some density-based distance over the graph. The inner structure information of the dataset is then obtained and utilized to compute the classifier. On semi-supervised classification, a kNN density-based distance form is presented to re-weight the graph, then the Laplacian kernel method is introduced to build classifiers over the whole feature space. On semi-supervised clustering, a density-based constraint expansion method is proposed. The constraint set is expanded by the similarity of the data samples. Then the expanded constraint set contains the manifold information of the dataset, and can be used in all semi-supervised clustering algorithms.Finally, the main research contents are summarized at the end of the thesis with an expectation for future study and research.

Keywords/Search Tags:

Collaborative Filtering, Unsupervised Learning, Semi-supervised Classification, Semi-supervised Clustering, Finite Mixture Models

PDF Full Text Request

Related items

1	Research On Semi-supervised Clustering And Classification Algorithm
2	Research On Recommendation Algorithm Based On Semi-supervised Learning
3	Research Of Reliable Semi-supervised Classification
4	Target Classification Of Synthetic Aperture Radar Based On Semi-supervised And Unsupervised Learning
5	Research On Semi-supervised Learning Methods On Heterogeneous Information Networks
6	Research And Implementation Of Collaborative Filtering Recommendation Algorithm Based On Semi-Supervised Learning
7	A Study On Some Problems Of Semi-supervised Learning
8	The Research Of Collaborative Recommendation Algorithm Based On Semi-Supervised Learning
9	Research On Network Uncivilized Text Classification Methods Based On Semi-supervised Learning Models
10	Research On Semi-supervised Classification Of Data Stream Based On Adaptive Density Clustering