Font Size: a A A

Research Of Incremental Web Log Mining Based On Rough-Set And Fuzzy Clustering

Posted on:2014-08-09Degree:MasterType:Thesis
Country:ChinaCandidate:H YangFull Text:PDF
GTID:2268330401472035Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the vigorous development of Internet at present, especially the progress of electronic commerce provides Web service for more and more users. The increase of netizen’s options and the rational and diversified demand for web service lay the realistic basis for the development of Web mining. Analyzing the user’s group behavior to obtain their interests as well as improving Website structure through the analysis of user’s interests to occupy the advantageous position in the competition have become the urgent issues to be settled for numerous Web service providers.Web log mining is a process of excavation of knowledge in which people are interested from the log data stored in the Web server. At present, many researches focus on Web usage mode mining, which is meant to discover the user’s interests by means of the analysis of Web user’s behavioral patterns and thereby enhance the site’s attraction through the improvement of site structure. This mining mode is ambiguous and the traditional mathematical theories are not capable for completing this task, so the related knowledge about fuzzy clustering and rough set is adopted.This thesis combines density-based DBSCAN algorithm with the traditional SOFM network and proposes an incremental clustering algorithm grounded on the SOFM network. SOFM network can be applied to High-dimensional Data Clustering, and is featured with good self-organized learning ability and training capability, which is extremely suitable for Web log mining. However, since the access of a Web site need to represent the user’s diversified interests, clustering results are supposed to change with the drift of user’s interests, which can’t be achieved by the traditional SOFM network. To settle this problem, DBSCAN algorithm is combined with SOFM network and DBSCAN algorithm can be applied to discover clusters of any shape and is sensitive to the change of clusters, so with the incremental change of data set, such algorithm can find out the drift of user’s interests. SOFM network improved by virtue of a large amount of sample data is trained to make parameter and weight stable, which prepares for the network application stage.During the stage of network application, the weights of every parameter and output neuron are set as the trained weights with no alternation. Input pattern is clustered and then the membership degree of input pattern is updated by membership function. The neuron above the threshold of membership degree will be output.Finally, the thesis designs a simulation experiment and conducts a clustering analysis of Web log data from a certain news website. The pattern, which is equipped with the capability of incremental clustering, is different from the traditional clustering method. Firstly, network training is conducted with sample data, and then test data will be clustered. At last, the experimental results are compared from training error and clustering results. The experimental result shows that in terms of data set clustering of incremental alternation, the algorithm proposed in the thesis is more advantageous than the traditional clustering algorithm in efficiency and accuracy, and vividly reflects the user’s diversified interests on the Web.
Keywords/Search Tags:Web mining, Degree of membership, SOFM network, Fuzzy clustering
PDF Full Text Request
Related items