Font Size: a A A

Research On Access Behavior Classification Based On Web Logs

Posted on:2011-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:H F JiangFull Text:PDF
GTID:2178360302488509Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology and the extensive application of e-commerce, Internet users increase gradually, access behaviors and purpose of users are also diverse. Different types of access behavior have different impact on the website. The normal access behavior can help website designers to optimize design of site, but behaviors of computer access and other abnormal access will affect network analysis, increase network load, etc. Therefore, the research of access behavior classification has important practical significance.Some important questions are reviewed and expound in this thesis. Firstly, introduces the actuality of access behavior classification, focuses on analyzing the principle of human-computer access behavior classification, and points out the characteristics of methods which are applied in human-computer access behavior classification.Secondly, in order to enhance the quality of classification, the two major issues which affect human-computer access behavior classification are improved in this thesis. On one hand, session identification is the basis of the existing classification technologies, and its quality has far-reaching impact on the classification results. As for unreasonable tagging methods of page set of session and imperfect approach of path complement, in the thesis that is put forward making use of non-page log information and the website topology to improve pages set grouping of session and path complement algorithm, and the quality of session identification is enhanced. On the other hand, session identification is basis of classification technologies, but their relations are not sufficiently utilized. One method is proposed that is user-oriented human-computer access behavior classification based on Bayesian technique, whose characteristic parameters are taken into account of the relationship within the session and between sessions roundly. The method avoids to analyzing information of a single session and obtains better classification results.Finally, according to the method narrated above, we implement a classification system of human-computer access behavior by using web log data of the website that is built in the lab. The experimental results show that the method improves the quality of human-computer access behavior classification, and the classification of users will be a basis of using selectively web log data in the follow-up data analysis and data mining.
Keywords/Search Tags:Access behavior classification, Bayesian classification, Path complement, Session identification
PDF Full Text Request
Related items