Font Size: a A A

Research On Theories And Methods Of Information Filtering Under Web 2.0

Posted on:2010-09-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:D F LiFull Text:PDF
GTID:1118360302963029Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Rapid development has been achieved of Internet in recent years. As the technologies such as Web 2.0 advance, more and more information activities and applications are carried on Internet, people becomes more and more dependent on internet than ever.In Web 2.0 era, on one hand, there are diversified media format on Internet. The auditoryand visual information combined with traditional text information, greatly enriched contents of Internet and improved user experience. To filter the multimedia information becomes the important task in Web 2.0 information filtering. On the other hand, users become the center of the Ineternet. The vast amount of information is consumed and created by users. Those user-created information enriched the contents of the Internet and provided people many information sources.Besides, the huge amount of users and user actions has bring Internet vast amounts of data. How to modified traditional machine learning algorithms to fit large scale computing circumstances is a difficult research topic.We focus on the study of information filtering in Web 2.0 era. We analysed the challenges of information filtering in Web 2.0, and studied the problems on filtering of various media types, large-scale machine learning algorithms and mining user feedbacks. We proposed theory analysis and solutions to these problems. The main research contents and innovation achievements of this paper as follows:1. We proposed a unified information filtering algorithm based on multiple features of multiple media types in Web 2.0 era. Specific to advertising image detection problem, we utilize the features like image content and image's surrounding text feature, and integrate machine learning algorithms like SVM and AdaBoost. The filtering results demonstrate the effectiveness of our algorithm. The feature set combines of media content feature, web page visual layout feature and text feature. These features are verified to be useful in classifying advertising images. Moreover, we proposed a feature selection algorithm based on AdaBoost, which can select useful features out of the original full feature set. We construct a large dataset to verify our algorithm. The experiment results demonstrate that our feature selection algorithm is feasible and reseanable. In addition, we compared the effectiveness in classification of each feature.2. We proposed a fast spectral clustering algorithm(FSC) based on Normalized Cut, which can peform clustering on large scale text corpus. We analysed the bottleneck of utilizing spectral clustering algorithm on large scale text corpus, and proposed solutions. Firstly, FSC uses GSASH methods to build a graph from large-scale text corpus. Secondly, FSC utilized AMG method to iteratively reduce a large-scale eigenvalue system into a samller one, and obtained an approximating solution. We perfomed verification of FSC from both theory and experiment aspects. The experiment results demonstrate that the complexity of FSC reduces down to O log while keeping the good performance of spectral clustering.We proposed a hot topic evaluation and mining algorithm based on heat diffusion model under Web 2.0 environment. First, we model the Internet under Web 2.0 according its dynamic and social property. Second, we regard the information activities on Internet as heat acitivities, then we use heat diffusion model to model these activities. We use the feedback of web users as heat input, and evaluate the hot degree of information on Internet and mining the hot topics. This paper makes a detailed definition of heat diffusion model, and proved its stability and convergence. The experiment results demonstrate that our algorithm can simulate information activities on Internet.
Keywords/Search Tags:Web 2.0, Information Filtering, Advertising detection, Large-Scale Clustering, Spectral Clustering, Heat Diffusion Model, Hot topic detection
PDF Full Text Request
Related items