Font Size: a A A

The Research On Clustering Analysis And Its Application In Web Log Mining

Posted on:2012-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:X J WangFull Text:PDF
GTID:2178330332490048Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and the wide application of database, the contra-diction between the capacity of information supply and analysis has become increasingly prominent. People urgently need an automation technology that can further research and ana-lyse data information. The data mining technology arises at the historic moment that informa-tion is abundant and knowledge is poor. Clustering analysis is an important branch of data min-ing. It belongs to the category of unsupervised learning and is an important method to help peo-ple understand the real world. Clustering analysis can be used as an independent tool to obtain the data distribution, observe the character of each cluster, and do further research on particular clusters. Traditional clustering analysis is a method of mechanical classification, and its bound-ary is clearly, so it has the character of exclude each other. However, in reality, there is no strict distinction among boundaries for many things, so the process of clustering analysis must be accompanied by fuzziness, and the fuzzy clustering analysis technology appears.Web log mining is an important research area of data mining. Behavior information of the web users is concealed in the Web log, the Web log mining can find characteristics and rules of the users'visiting behavior, and then we analyse the characteristics and rules to identify poten-tial customers of web site and to improve the service quality to users. Cluster analysis technol-ogy can be applied to Web log mining. By analysing user's visiting behavior we can realize automatic classification of users according to their interests and find page group that are ac-cessed by the same users, thus it can help us to improve the website's structure, recommend in-dividual service, etc. And due to the characteristic without structure of Web data, the log data must be preprocessed before the clustering analysis.Based on describing basic concepts and related knowledge of data mining, clustering analysis, fuzzy theory and Web log mining, this paper does comparatively deep research on the existent insufficiency of FCM cluster algorithm. FCM cluster algorithm is one of the widely applied algorithms in fuzzy clustering analysis field. It is an algorithm based on the objective function, and it obtains the optimal results by minimizing the objective function. The algorithm has simple design as well as wide range of applications, but meanwhile the algorithm also exists many problems need to be solved, such as: need to define the parameter of cluster prototype artificially, clustering results easy to fall into the local optimum, and difficult to find data sets that outside of globular class.On the basis of numerous research achievements, this paper does comparatively deep re- search on the existent shortages of FCM cluster algorithm, at the same time it presents corre-sponding improvement measure through specific analysis. The paper improves the algorithm mainly in two ways. On the one hand, according to relevant rules, the initial cluster centers are purposefully selected in the global scope by searching data matrix. This method can effectively reduce the possibility that the algorithm easily falling into the local optimum. On the other we can express a big cluster used several small clusters, then merger adjacent clusters that satisfy certain conditions. By applying the improved algorithm to Web log mining, we got effective results of user clustering and page clustering. Experiment result demonstrates that the improved FCM cluster algorithm can decrease the dependence on the initial cluster centers and get more accurate clustering results.
Keywords/Search Tags:Data Mining, Clustering Analysis, Web Log Mining, FCM cluster algorithm
PDF Full Text Request
Related items