Font Size: a A A

Minining And Utilizing Users' Interests In Web Logs

Posted on:2005-10-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y GuoFull Text:PDF
GTID:1118360185995663Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Web-log mining aimed to mine Web user access patterns from Web logs, which based on such hypothesis that there is some characteristics of user accessing Web exist in Web logs, and these characteristics are reflected in some patterns, and the patterns can be mined and utilized. Lots of researches on Web-log mining are based on the hypothesis. Is there realy some characteristics of user accessing Web exist in Web logs? And if Yes, can these characteristics be described clearly? And how to use the characteristics? To try to answer these questions, the dissertation uses such techniques as statistics, clustering and modeling in such researches as mining Web user access characteristics, Web IR, Web site aided design and Web performance improving.The contribution of the dissertation is as follow:(1) Is there realy some characteristics of user accessing Web exist in Web logs? And if Yes, can these characteristics be described clearly? In order to answer the questions, the dissertation analyzes real Web logs by positive statistics. The main work includes: As scale of Web logs increasing, how does the users'count, Web pages'count and the average of Web pages'count accessed by one user change? And what is the motivation of user accessing Web? The conclusions draws from experiment are very useful, and they can provide some foundations for research on Web-log mining.(2) The dissertation presents an IR model named WUBIRM (Web Usage Based IR Model) and a prototype of search engine named SISI (Similar Interests, Similar access on Internet) based on Web user access manners. Nowadays Web IR is mainly based on content mining and structure mining. The dissertation deems that human should be the best one to judge which pages are related. In order to simulate human judgment in related pages, the dissertation discusses Web IR from the Web user point of view, and it tries to make good use of latent human judgment in related pages contained in Web logs to mine related pages. The research can actually help to improve traditional IR and retrieve related pages from mass information soon and precisely, which is of indispensable significance.(3) Based on transformation of user-space (users'access-frequency matrix), the dissertation presents the concept of user-interest space and two algorithms to construct user-interest space: one is based on gene-analysis, and another is based on a duplex phenomenon between the users clustering and pages clustering in user-space. Compared with user-space, user-interest space gives prominence to users'common interests and is actually orthodoxy. Then to cluster Web pages in user-space and two user-interest spaces, and experimental results show that contrast to clustering Web pages in user-space, clustering in user-interest space can get better results, and the conversion from user-space to user-interest space can compress data well, and clustering in user-interest space constructed by...
Keywords/Search Tags:Web-log mining, users'interests, IR, gene-analysis, Web cache replacement policy
PDF Full Text Request
Related items