Font Size: a A A

System Of Intellective Individuation Based On WEB Usage Mining

Posted on:2005-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:M N ChenFull Text:PDF
GTID:2168360122488144Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Today, Internet has been developing with incredible speed, at same time, more and more institutions, groups and individuals issuance and lookup information in the Internet. Although there is a mass of information in the Internet, people always feel that they can't lookup what they want find. So, we suppose there is such an IIW(intellective individuation website), which not only can cluster users and web pages, but also can provide different services for different users. In another word, the website may realize individuation services.Because web'is unstructured and dynamic, we can't mine the web directly. To our surprise, Web log file has integrated structure. So we decide to realize IIW by mining Web log mining.The thesis analyses the actual state of Internet at first, and proposes the problem. Then the thesis introduces Web Mining and the basis theory of fulfilling IIW.Subsequently, the thesis brings forward the system structure.Data preprocessing phase. This phase is the first task, which includes data ready, data refining, data transforming, data inducting.Web log data mining algorithm phase. The phase is one of work emphasizes. It utilizesimproved Matrix Cluster Algorithm--Value Matrix Cluster Algorithm to cluster users and webpages. Compared with common Matrix Cluster algorithm, the improved one considers accessingfrequency of the web and brings forward a new concept--the correlative weight value matrix,which is one innovation in this thesis.Mode analyze and appliance phase. In this phase the mining results are applied to predict user's visiting path and classify new users, which is another working emphasis. We bring forward a new prediction algorithm of path-HCI Algorithm. Its basic idea is: computer a score for each link on the current page being read, and select the link with the maximum score, which is another innovation in this thesis. HCI Algorithm is more accuracy and more simple in prediction path than traditional algorithms. For new users, when they browse the website, we can classify them by their similitude with others.The result of experiment using web log of university shows that the idea mining data by improved Matrix Cluster and providing individuation services is effective and feasible. The experiment builds road for e-business website with individuation services.
Keywords/Search Tags:web mining, web usage mining, individuation, matrix clustering, prediction of visiting path
PDF Full Text Request
Related items