Study On Data Mining Based On Web Log

Posted on:2008-09-20

Degree:Master

Type:Thesis

Country:China

Candidate:Y Y Zhang

Full Text:PDF

GTID:2178360242471645

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

With the explosive growth of information available on the Web, discovery and analysis of useful information from the Web become an urgent necessity. Faced massive information on the Web, it is becoming more and more difficult to fetch valuable information. Web servers register a web log entry for every single access they get, in which important information about accessing are recorded, including IP addresses, date and time stamp, method, URL requested, file size etc.. It records user reaction and motivation. Web log mining principally extracts user's interested access patterns from access log files in web server to find user'browsing behavior and realize personalized recommendation service.Clustering technologies are able to find out user groups which have similar browsing behavior, and classify pages having similar characteristic into a class. Traditional clustering techniques do not take into account the diversity of user preferences. Therefore the clustering result is not ideal. After in-depth research on existing clustering algorithm, this paper presents improved LFCM fuzzy clustering algorithm to cluster user transactions. Frequent access paths express user's access patterns. Apriori association rule is a typical approach to find frequent access paths, but the resulting candidate items are so many that the efficiency is low. In this thesis the basic idea for mining frequent access paths is that mining k length of the frequent access paths is generted by the self connection of the two k-1 length frequent access paths. This algorithm can reduce the number of database scanning and improve efficiency. Currently page recommendation frequently reflects interest of users by means of accessing frequencies and staying time on the Web page. But we don't think this can fully reflect the interest of users. Thus, we propose frequent access path and the Web pages frequencies accessed, and the end page of user sessionscan can reflect the user's browsing patterns.In this paper, we investigate the issues related to efficiently mining user access pattern from an amount of Web log files. The main contributions are as follows:①This paper describes and analyzes the preprocess technologies, including data cleaning, user identification, session identification, path completion, transaction identification etc. Preprocess is a key step in Web mining and its result directly impact on mining. ②The fuzzy mathematics is introduced to process imprecise and uncertainty issues. An improved LFCM fuzzy clustering algorithm is proposed, which is based on fuzzy c-means (FCM) algorithm. The complexity of LFCM algorithm is reduced. So, the complexity of LFCM algorithm is linear proportional to the number n of user transactions and choice parameter p. The experiment testifies that LFCM fuzzy clustering algorithm is more effective to accomplish the cluster than FCM algorithm. Also clustering validity function is introduced to get the best classification number.③Frequent access path reflects the user's access patterns. By utilizing maximal forward path(MFP) and the method based on directional tree, user transaction pattern is recognized. The frequent access paths are obtained from maximal forward paths in user sessions. A new recommendation algorithms for web pages is given to recommend some interesting pages to user.④One web mining system prototype of personalized recommendation is presented. The system monitors user access behavior in real time. The next page to be possibly accessed will be predicted on the basis of user's current access. The pages of highest interest degree will be recommended dynamically.

Keywords/Search Tags:

Web mining, web log, fuzzy clustering, frequent access path, personalized recommendation

PDF Full Text Request

Related items

1	Apply And Research On Web Log Mining Of Website Personalized Service
2	Research Of Personalized Recommendation On Air Travel Booking Website
3	Application Research Of Personalized Recommendation Service Based On Web Clustering
4	Research On The Application Of Data Mining Technology In Personalized Web
5	Research On Personalized Recommendation Service Based On Clustering Of Web Access Log
6	The Algorithm Reach Of E-commerce Personalized Recommendation Based On Web Mining Technology
7	Research On Logstics Path Planning And Frequent Path Mining Based On Internet Of Things
8	Path Recommendation Research Based On GPS Data And Frequent Pattern Mining
9	Research On Personalized Recommendation Algorithm Based On User Clustering
10	Research And Implementation Of Personalized Recommendation System Based On Web Log Mining