Font Size: a A A

Research On Users Clustering Based On Web Log Mining

Posted on:2012-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:X C NiuFull Text:PDF
GTID:2248330395955264Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet, the contradiction between rapid growth ofthe information and the people’s limited attention is unceasingly increasing, but the weblog mining is an effective means to solve it. Web servers register a log entry for everysingle access they get, in which important information about accessing are recorded,including IP addresses, date and time stamp, method, URL requested, file size etc. Itrecords user reaction and motivation. The Web log mining technology may discoverusers’ browsing patterns and the link relationship of Web pages. Through analyzing userbehaviors can give suggestions and guidance to provide personalized customizationservice、optimize layout of some pages and the whole architecture of Web site.This paper probes deeply into the basic theory and fuzzy clustering algorithm ofWeb log mining, and brings forward some innovation and improvement for someproblems of Web log mining. The main contents of the innovation and improvement areas follows:(1) Data preprocessing plays an essential role in the process of Web log mining. Itis the prerequisite to provide effective input and gain valuable mining results for datamining algorithm. And the key factor of data preprocessing is how to acquire theinformation of the Web topology structure. This paper presents an approach to acquirethe Web topology structure through the log files in web server, and use test cases toprove its efficiency and accuracy.(2) User access paths is one of the parameters of measuring a user interest degreewhen web users clustering. With the disadvantage of expression of user access pathsinterest, by analyzing the log recorders’ features and the mathematical characteristics ofthe evaluating-parameter, this paper proposes the user access path matrix based onbinary to measuring users’ interest degree. Through the matrix we construct thedissimilarity matrix. The practical examination shows that the user access path matrix iseffective and feasible, and the dissimilarity matrix is accurate to represent differences.(3) The paper studies and deduces Fuzzy C-Means (FCM) clustering algorithm indata mining. FCM clustering algorithm select the initial cluster centers is random,which can abate the accuracy. This paper proposed an improved FCM, combined themethod of the dissimilarity matrix. The paper carries through instance validating,proving improved FCM to be efficient and possess high performance. The feasibilityand correctness of the improved FCM algorithm are validated by experimentation.The proposed new methods and improved algorithm in this paper have better practicality, the content of future research are that we can construct an effective Web logmining system and find the relationship between the threshold and the optimalclustering number.
Keywords/Search Tags:Web Log Mining, Topology Structure, Access Path Matrix, WebUsers Clustering, Fuzzy C-Mean
PDF Full Text Request
Related items