Font Size: a A A

Study On Data Mining Based On Web Log

Posted on:2008-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y ZhangFull Text:PDF
GTID:2178360242471645Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the explosive growth of information available on the Web, discovery and analysis of useful information from the Web become an urgent necessity. Faced massive information on the Web, it is becoming more and more difficult to fetch valuable information. Web servers register a web log entry for every single access they get, in which important information about accessing are recorded, including IP addresses, date and time stamp, method, URL requested, file size etc.. It records user reaction and motivation. Web log mining principally extracts user's interested access patterns from access log files in web server to find user'browsing behavior and realize personalized recommendation service.Clustering technologies are able to find out user groups which have similar browsing behavior, and classify pages having similar characteristic into a class. Traditional clustering techniques do not take into account the diversity of user preferences. Therefore the clustering result is not ideal. After in-depth research on existing clustering algorithm, this paper presents improved LFCM fuzzy clustering algorithm to cluster user transactions. Frequent access paths express user's access patterns. Apriori association rule is a typical approach to find frequent access paths, but the resulting candidate items are so many that the efficiency is low. In this thesis the basic idea for mining frequent access paths is that mining k length of the frequent access paths is generted by the self connection of the two k-1 length frequent access paths. This algorithm can reduce the number of database scanning and improve efficiency. Currently page recommendation frequently reflects interest of users by means of accessing frequencies and staying time on the Web page. But we don't think this can fully reflect the interest of users. Thus, we propose frequent access path and the Web pages frequencies accessed, and the end page of user sessionscan can reflect the user's browsing patterns.In this paper, we investigate the issues related to efficiently mining user access pattern from an amount of Web log files. The main contributions are as follows:①This paper describes and analyzes the preprocess technologies, including data cleaning, user identification, session identification, path completion, transaction identification etc. Preprocess is a key step in Web mining and its result directly impact on mining. ②The fuzzy mathematics is introduced to process imprecise and uncertainty issues. An improved LFCM fuzzy clustering algorithm is proposed, which is based on fuzzy c-means (FCM) algorithm. The complexity of LFCM algorithm is reduced. So, the complexity of LFCM algorithm is linear proportional to the number n of user transactions and choice parameter p. The experiment testifies that LFCM fuzzy clustering algorithm is more effective to accomplish the cluster than FCM algorithm. Also clustering validity function is introduced to get the best classification number.③Frequent access path reflects the user's access patterns. By utilizing maximal forward path(MFP) and the method based on directional tree, user transaction pattern is recognized. The frequent access paths are obtained from maximal forward paths in user sessions. A new recommendation algorithms for web pages is given to recommend some interesting pages to user.④One web mining system prototype of personalized recommendation is presented. The system monitors user access behavior in real time. The next page to be possibly accessed will be predicted on the basis of user's current access. The pages of highest interest degree will be recommended dynamically.
Keywords/Search Tags:Web mining, web log, fuzzy clustering, frequent access path, personalized recommendation
PDF Full Text Request
Related items