Font Size: a A A

Research On Web Usage Mining With Enterprise Proxy Logs

Posted on:2011-07-14Degree:MasterType:Thesis
Country:ChinaCandidate:Z J ZhouFull Text:PDF
GTID:2178360308463860Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the rapid development of the Internet, the WWW has undoubtedly become one of the main platforms to obtain and publish information. Web Usage Mining, as a branch of Web Mining, digs out the hidden patters automatically by mining the logs of web servers and web browsers. These patterns are used in understanding the accesses to the system and the users'behavior patterns, which are valuable for the information layout and the user's personalized recommendation.In recent years, Web Usage Mining has gradually drawn more and more attention and has been applied sucessfully in the e-commerce, the site-added design, the personalized service and so on. However, the researchs of Web Usage Mining are mainly basing on the web server logs, rarely on the enterprise proxy logs. The enterprise proxy logs are visit historys of user accessing Internet through the proxy servers. Mining these logs helps us to optimize the proxy caching strategy, to evaluate the performance of the proxy servers, to analyze user browsing behavior and to provide personalized services, which are valuable for resource planning, surfing standardizing and improving the acess efficiency.Based on the enterprise proxy logs, this paper compares the difference between the enterprise logs and the web server logs. On this basis, we propose an incremental data cleaning algorithm, which can archieve good result while unknowning the topology of the web sites. Then a tree model preprocess algorithm is proposed. In aspect of mining user access patterns, this paper analyses a variety of algorithms, and then proposed an algorithm based on the URL hierarchical similarity– UHMA, which is well adapted to the context of the enterprise proxy logs. In aspect of user navigation prediction, a collaborative filtering based algorithm is proposed to provide personalized recommendation along with RSS subscriptions.A web usage mining model based on enterprise proxy logs– EPWUM, is proposed in this paper. It comprised the offline component and the online component. The offline component is responsible for off-line analysis of logs to mining user access patterns. The online component is used to make navigation prediction. The results show that this model can adative to the context of using enterprise proxy logs, and can successfully applies Web Usage Mining technology to it.
Keywords/Search Tags:web usage mining, enterprise proxy log, key index page, navigation tree model, user access pattern, navigation prediction, URL hierarchical path
PDF Full Text Request
Related items