Font Size: a A A

Web Access To Information Mining Several Key Technologies

Posted on:2007-11-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y J YuFull Text:PDF
GTID:1118360182993818Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
World Wide Web is developed rapidly. Internet has changed our life greatly. At the same time, the main works in WWW, such as Website design, Web service design, become more complexity and onerous.Enormous Web usage information exists in WWW. By mining Web usage information efficiently, we can obtain the knowledge about user access manners, which can be used for Web service providers and Web users. The results can provide kinds of information for improving Web server's design, improving Website server performance, improving personalization service, and so on. For all, Web usage mining can find very useful knowledge. Now, Web usage mining has become a new and important research field in the world.This dissertation summarizes and analyzes the characteristics of Web usage mining. The current research status in this field is also classified and introduced.The dissertation addresses the researches of the groupment research field and personalization research field in Web usage mining.1. The groupment research field:To mine user access manners, this dissertation presents fuzzy clustering algorithm based on vote policy. It also presents user space model and builds it based on the algorithm. The element of user space model is user group. Compared with Fuzzy C-means clustering algorithm, the algorithm not only is robust in getting cluster numbers with no supervising but also gets more stable convergence center. The user space model is the base for the research.To mine vast users' interest, this dissertation presents hybrid Markov model and uses the model for interest navigation pattern discovery. It solves the problem that navigation pattern discovered by current other navigation pattern discovery method can't reflect user access interest because it can only represent access number, not reflect access time. Using hybrid Markov model can get more highly prediction accuracy rate and prediction overlay rate according to traditional Markov model. The discovered knowledge can help service providers to improve their service.To reduce user redundant access, this dissertation builds a self adaptive model. The mark strategy and the downgrade strategy are presented in the model. The association access sets mined by the association rule will be put into the suitable Web pages. So, the navigation Web pages become the navigation and content Web pages. Website can be adaptive according to the users' access.2. The personalization research field:To mine relevant feedback, this dissertation presents feedback space model. Italso presents improved Bayesian algorithm for building the model. The algorithm used logarithm regression method can solve the problem that model built by Bayes algorithm has a very big remnant value and distributes in small space. It is used to help Web service providers for improving Web server's design and help to eliminate mismatch between author's expression and user's understanding and expectation.To mine user path characteristics to provide personalization recommendation, this dissertation presents personalized Web recommendation approach based on Interest clustering. According to K-paths clustering, more effective path similarity function is given and the interest clustering is based on competitive agglomeration and can determine the best cluster number automatically. The approach used user's access interest and path association. The approach combines the association rules and path clustering for providing page recommendation set. Compared with current other methods, the approach solved the problems that it will disturb user with getting user register information and recommendable elements go to zero with user access number increasing.This dissertation designs Web usage mining prototype system by integrating these above approaches. LogRover was developed based on the prototype system. It was used successfully for real application. It can help Website designers to improve the structure of Website, provide personalization service, improve Website efficiency and provide special services for more users.Finally, this dissertation summarized the author' works and discussed the future works.
Keywords/Search Tags:Web usage mining, session identify, cluster intensity vote policy, user space model, user group, interest intensity, relevant feedback, feedback space model, hybrid Markov mode, interest clustering, competitive agglomeration, path similarity
PDF Full Text Request
Related items