Font Size: a A A

Deriving emergent semantics from Web browsing paths

Posted on:2004-01-01Degree:Ph.DType:Dissertation
University:Wayne State UniversityCandidate:Sreenath, DvFull Text:PDF
GTID:1468390011977115Subject:Computer Science
Abstract/Summary:
The World Wide Web has become the primary repository for all information: scientific, business and entertainment. The abundance of information allows for free exchange of ideas, but at the same time is not bound by any central authority to validate or authenticate its content. In addition, the authors of the Web pages are not aware of the manner in which their content is used. Thus the innocent information published on the Web has potential for malicious use. We propose that the author of a Web page cannot Completely define that document's semantics and that semantics emerge through use. Our goal is to derive the emergent semantics of the Web pages from the browsing paths of the users.; In a search engine, a user enters a query string and the search engine retrieves a list of URLs that, match the query in the order of relevance. Our research can be considered as the reciprocal of a search engine: the problem is to derive the semantics from the sequence of Web pages traversed by a user. Using an iterative process, we derive the semantic breakpoints of long browsing paths. This identifies short sub-paths with coherent uniform semantics. Using a variation of Latent Semantic Analysis, we attempt to derive high-level semantics of the browsing pattern of the user. The semantics of a Web page is defined in terms of these sub-paths. Various sub-paths represent different browsing patterns and hence different semantics. Recursively, the semantics of a sub-path is defined as the aggregation of the semantics of the Web pages of which it is composed.; In our research, we have attempted to use the multimedia information in the Web pages along with textual information to derive semantics. In particular we have focused on embedded images in news articles. Image semantics are context sensitive. User browsing paths over an image collection provide, we believe, the necessary context to derive image semantics. Such semantics emerge over a period of time through user interaction. We have attempted to use this emergent semantics for content-based image retrieval. Based on experiments which retrieve images from a diverse image collection using a sequence of images as a query against a database of semantically coherent browsing paths through the image collection, our results are encouraging. We have a simple technique to extract visual features (visual keywords) in the compressed domain from JPEG images embedded in Web pages.; We have developed and tested several algorithms to detect breakpoints and cluster Web pages into groups that exhibit uniform semantics. We have used visual-keywords to improve textual-keyword based searches. This is our attempt to prove that information from several modalities can be used to improve retrieval using a single modality.
Keywords/Search Tags:Web, Semantics, Browsing paths, Information, Using
Related items