Research On Web Accessing Pattern Discover And Application

Posted on:2007-04-07

Degree:Master

Type:Thesis

Country:China

Candidate:A M Song

Full Text:PDF

GTID:2178360185992446

Subject:Software engineering

Abstract/Summary:

Today, the World Wide Web is rapidly emerging as an important medium for the dissemination, exchange, and getting of information. According to most predictions, the majority of human information will be available on the Web in ten years. These huge amounts of data raise a grand challenge, namely, how to turn the Web into more useful information utility.At present, the main tools of getting information are still search engines. Today's search engines, however, are plagued by the low precision problem, the low recall problem, A limited query interface that is only based on keyword-oriented search, and have no function of customization to individual users. These problems, in turn, can be attributed to the following characteristics of the Web. First and foremost, the Web is a huge, diverse and dynamic collection of interlinked hypertext documents. Second, except for hyperlinks, the Web is largely unstructured. Finally, most information on the Web is in the form of HTML documents for which analysis and extraction of content is very difficult. Therefore, it is not easy to overcome all problems caused by search engines.In this thesis, by analyzing web access behavior, discover user browsing patterns such as aims, interests, and preferences. Then these patterns are utilized in improving the structure of web sites and the manner of web service. Thus, we can help users getting what they need more easily by personalized information service and automated site administration.The dissertation is composed of the following parts:(1) We discuss various problems met during data preparing and corresponding resolved methods in web access behavior analysis. Then give a simple method to identify users and access transactions.(2) We present a quick method to mine the frequent path and the reachable set and probability of web pages browsed by users based on the suffix tree; According to the discovered frequent paths, we develop an effective method to cluster user accessing transaction. It overcomes the shortcomings of current methods that ignore the major features of users' access to the web: ordinal, contiguous and duplicate, and the clustering dimensions are very high; we also discuss web pages fuzzy clustering.(3) Create a logic design model about how to integrate the weblog data, the marketing data, and the web metadata into the web data warehouse. Thus using our warehouse, site administrators can get some information about users accessing the web site and managers can get some information for their commercial decision.(4) For the discovered patterns such as web page cluster information and frequent browsing paths, we also discuss their application problems in personalization service and site administration including organizing and reconstructing automatically.

Keywords/Search Tags:

browsing pattern, web access data, personalization, site administration

Related items

1	Based On Web-log Frequent Browsing Paths Mining And Technology Analysis
2	User Browsing Interest Prediction And Personalization Recommendation Strategies Based On WEB Usage Mining
3	Design And Implementation Of Personalization Service System Based On Web And Data Mining
4	Research On Algorithm Of Browsing Pattern Mining In Web Log
5	Study Of Web Usage Mining Based On Rough Set Theory
6	Using indexes and data cubes to support browsing and summarization of information in databases and digital libraries
7	The Research And Application Of Search Engine Personalization Query Expansion Technology
8	The Research On Personalization Recommendation Technology And Its Application On Digital Library
9	Design And Implementation Of Personalization Marketing Algorithm Based On Data Mining
10	Personalized Research Combining With Analysis Of Web User's Behavior Based On The User's Browsing Content