Font Size: a A A

Research On Web Accessing Pattern Discover And Application

Posted on:2007-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:A M SongFull Text:PDF
GTID:2178360185992446Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Today, the World Wide Web is rapidly emerging as an important medium for the dissemination, exchange, and getting of information. According to most predictions, the majority of human information will be available on the Web in ten years. These huge amounts of data raise a grand challenge, namely, how to turn the Web into more useful information utility.At present, the main tools of getting information are still search engines. Today's search engines, however, are plagued by the low precision problem, the low recall problem, A limited query interface that is only based on keyword-oriented search, and have no function of customization to individual users. These problems, in turn, can be attributed to the following characteristics of the Web. First and foremost, the Web is a huge, diverse and dynamic collection of interlinked hypertext documents. Second, except for hyperlinks, the Web is largely unstructured. Finally, most information on the Web is in the form of HTML documents for which analysis and extraction of content is very difficult. Therefore, it is not easy to overcome all problems caused by search engines.In this thesis, by analyzing web access behavior, discover user browsing patterns such as aims, interests, and preferences. Then these patterns are utilized in improving the structure of web sites and the manner of web service. Thus, we can help users getting what they need more easily by personalized information service and automated site administration.The dissertation is composed of the following parts:(1) We discuss various problems met during data preparing and corresponding resolved methods in web access behavior analysis. Then give a simple method to identify users and access transactions.(2) We present a quick method to mine the frequent path and the reachable set and probability of web pages browsed by users based on the suffix tree; According to the discovered frequent paths, we develop an effective method to cluster user accessing transaction. It overcomes the shortcomings of current methods that ignore the major features of users' access to the web: ordinal, contiguous and duplicate, and the clustering dimensions are very high; we also discuss web pages fuzzy clustering.(3) Create a logic design model about how to integrate the weblog data, the marketing data, and the web metadata into the web data warehouse. Thus using our warehouse, site administrators can get some information about users accessing the web site and managers can get some information for their commercial decision.(4) For the discovered patterns such as web page cluster information and frequent browsing paths, we also discuss their application problems in personalization service and site administration including organizing and reconstructing automatically.
Keywords/Search Tags:browsing pattern, web access data, personalization, site administration
PDF Full Text Request
Related items