Font Size: a A A

Multi-marker Propagation Clustering Algorithm And Its Application In Web Log Mining

Posted on:2009-11-16Degree:MasterType:Thesis
Country:ChinaCandidate:N ZhaoFull Text:PDF
GTID:2178360272478056Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid evolution of Internet, people meet the problem of Rich Data Poor Information when they are enjoying the convenience of network. An effective solution is applying data mining technology to WWW world. It is called Web mining. Web mining includes Web content mining, Web structure mining and Web usage mining. As one of the most valuable fields, Web log mining receives great attention of researchers. The Web log mining technology may discover users'browsing patterns and the link relationship of Web pages. Subsequently, we can obtain the results of user clustering and page clustering.Data preprocessing is the essential work in the early period of data mining. It is the prerequisite to provide effective input and gain valuable mining results for data mining algorithm. This paper studies the traditional data preparation process and improves the user identification algorithm because the traditional use identification algorithm is inefficient in the case of complicated network topology. According to requirements of Multi-marker Propagation Clustering Algorithm (MPCA), a suitable data preprocessing for MPCA is constructed. On the basis of the research of the clustering algorithm, taking the accessing frequency of websites as parameter, the concept of weighted associated matrix is introduced in order to measure the interests of users'access. An algorithm named multi-marker propagation clustering algorithm based on weighted matrix clustering is proposed in this paper. The sparse property of matrix is used to reduce the execution time of algorithm in MPCA, which is the expension of the marker propagation idea.The process of data preprocessing constructed in this paper avoids the complicated session identification and transaction identification. It can display the real-time access of users, and has a high efficiency. Eventually it can provide effective input data to mining algorithm. Compared with common matrix clustering algorithms, MPCA overcomes the disadvantages of the distance-based algorithm, such as high complexity of space and time. And it also has great advantages in processsing real-time mining of a large sparse matrix. The practical examination shows that mining results of using MPCA is effective and feasible to cluster users and pages.MPCA in this paper has better extendibility, but it has yet to be further improved, such as, we can constructe an effective Web log mining system; and MPCA can be combined with genetic algorithm to obtain higher efficiency.
Keywords/Search Tags:Web Mining, Web Log Mining, Data Preprocessing, User Clustering, Multi-marker Propagation
PDF Full Text Request
Related items