Font Size: a A A

Research On Sessions Indentification In Web Log Mining

Posted on:2008-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:X Y WangFull Text:PDF
GTID:2178360212995291Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Web log mining utilizing the technology of data mining to analyze and mining the data of network, obtains the visited the valuable patterns of information about Web. It is applied to personalization, improving Web sites and business. And data preprocessing plays an essential role in the process of Web log mining. Sessions'identification is a basal and pivotal process in the data preprocessing. Although the recent years have seen the flourishing of research in the area of sessions'identification, there is no deep research in this area. This paper will research how to improve the accuracy of sessions'identification.At first, an optimal method of sessions'identification is presented. The sessions which had been identified will be reconstructed by two ways in specifically conditions. One is cutting the two records in a same session, and another is combination the two adjacent records in two adjacent sessions. The accuracy of the reconstructed sessions will be tested by the evaluation of the accuracy of sessions in experiment.Secondly, a sessions'identification based on dynamic threshold of time is given. A segment of a log file will be selected as a swatch log. And this swatch log will be identified as swatch sessions using the optimal method. A threshold of time of the sessions persisted will be computed in the swatch sessions. This threshold is variation in different sect of time. The identification uses this dynamic threshold to identify sessions. The accuracy of sessions identified by this approach is tested in experiment.Finally, another way of sessions'identification based on the character is proposed. This way uses the technology of Web content mining to distill the characters of web pages, and to compute the vectors of these characters. A method of compute the relativity of Web pages by vectors is presented in thisway. And the boundary of a session is identified based on the variation of the relativity of web pages.
Keywords/Search Tags:Web log mining, Data preprocessing, Sessions'identification, Accuracy of sessions, Web content mining
PDF Full Text Request
Related items