Font Size: a A A

Research On Data Pre-processing Algorithm In Web Log Mining

Posted on:2011-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:H X ZhuFull Text:PDF
GTID:2178360302973624Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The swift and violent development of Internet, especially the whole worlds of Web popularizes and Web incomparably abundant amount of information.Through Web mining, we can draw necessary knowledge from Web page:to analyze the contents to total user receive and visit behavior and frequentness, we can get the general knowledge of behavior and mode of users, and use that to improve our web serve.And more importantly, through the understanding and analyzing of user's characteristic, it can help and develop the electronic commercial activities.Web log mining utilizing the technology of data mining to analyze and mining the data of network, obtains the visited the valuable patterns of information about Web.It is applied to personalization, improving Web sites and business.And data preprocessing plays an essential role in the process of Web log mining.User and sessions'identification is a basal and pivotal process in the data preprocessing.This paper will research how to improve the accuracy of user and sessions'identification algorithm.In this thesis, the process of data mining, web data mining and web log mining was reported, the technologe and process of web log mining was focused on, the method of data pre-processing is researched, including user and session's identification technologies.The mostly work of this paper is: Firstly, an active user-based user identification algorithm is presented. The algorithm uses both an IP address and a finite users'inactive time to identify different users in the web log. Our experiments result prove that the active user based algorithm shows much better performance over the basic algorithm even for small web log sizes. Secondly, the definition of session identification is given, the traditional method of pre-established time interval is optimized and the algorithm is described concretely based on the data structure. The empirical analysis prove that the quality of session is improved.
Keywords/Search Tags:Web Log Mining, Data Pre-processing, User Identification, Session Identification
PDF Full Text Request
Related items