Deriving emergent semantics from Web browsing paths

Posted on:2004-01-01

Degree:Ph.D

Type:Dissertation

University:Wayne State University

Candidate:Sreenath, Dv

Full Text:PDF

GTID:1468390011977115

Subject:Computer Science

Abstract/Summary:

The World Wide Web has become the primary repository for all information: scientific, business and entertainment. The abundance of information allows for free exchange of ideas, but at the same time is not bound by any central authority to validate or authenticate its content. In addition, the authors of the Web pages are not aware of the manner in which their content is used. Thus the innocent information published on the Web has potential for malicious use. We propose that the author of a Web page cannot Completely define that document's semantics and that semantics emerge through use. Our goal is to derive the emergent semantics of the Web pages from the browsing paths of the users.; In a search engine, a user enters a query string and the search engine retrieves a list of URLs that, match the query in the order of relevance. Our research can be considered as the reciprocal of a search engine: the problem is to derive the semantics from the sequence of Web pages traversed by a user. Using an iterative process, we derive the semantic breakpoints of long browsing paths. This identifies short sub-paths with coherent uniform semantics. Using a variation of Latent Semantic Analysis, we attempt to derive high-level semantics of the browsing pattern of the user. The semantics of a Web page is defined in terms of these sub-paths. Various sub-paths represent different browsing patterns and hence different semantics. Recursively, the semantics of a sub-path is defined as the aggregation of the semantics of the Web pages of which it is composed.; In our research, we have attempted to use the multimedia information in the Web pages along with textual information to derive semantics. In particular we have focused on embedded images in news articles. Image semantics are context sensitive. User browsing paths over an image collection provide, we believe, the necessary context to derive image semantics. Such semantics emerge over a period of time through user interaction. We have attempted to use this emergent semantics for content-based image retrieval. Based on experiments which retrieve images from a diverse image collection using a sequence of images as a query against a database of semantically coherent browsing paths through the image collection, our results are encouraging. We have a simple technique to extract visual features (visual keywords) in the compressed domain from JPEG images embedded in Web pages.; We have developed and tested several algorithms to detect breakpoints and cluster Web pages into groups that exhibit uniform semantics. We have used visual-keywords to improve textual-keyword based searches. This is our attempt to prove that information from several modalities can be used to improve retrieval using a single modality.

Keywords/Search Tags:

Web, Semantics, Browsing paths, Information, Using

Related items

1	Research Of User Preferred Browsing Paths Based On Web Log
2	Based On Web-log Frequent Browsing Paths Mining And Technology Analysis
3	A Study On The Wayfinding Theory-Based Network Users' Information Browsing Behavior
4	Research On Technology Of Mining User Browsing Paths Based On Hadoop
5	A Study On Collaborative Filtering Recommendation Technology Based On Users' Browsing Paths
6	A Domain Knowledge Based Model For Personalized Browsing Sapce Of Networked Information
7	A Semantic Association Based Literature Information Browsing System
8	Research On The Influence Factors Of Users' Online Browsing Behavior
9	Research And Application Of Information Retrieval Method Based On Semantics
10	Offline Browsing Information System Based On SYMBIAN OS