Font Size: a A A

Web Usage Mining Based On Granular Computing

Posted on:2011-10-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:J ZhaoFull Text:PDF
GTID:1118360308963883Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The amount of information inside Web is increasing at surprisingly high speed. Applications urge people to abstract, filter and discover useful information from these data. Appling data mining techniques to Web usage data, Web usage mining aims at discovering various meaningful patterns hidden in usage data, which contains important theoretical and application value for providing personalization service, improving Web server performance and design, providing business decision support, and so on.Applying intelligent technologies to Web usage mining, facing Electronic Commerce (EC), this dissertation aims at designing mining models and algorithms under uniform theoretical framework. Through collection, management and analysis of humorous usage data, the hidden patterns and rules are found, which can be used to provide decision support, improve EC Website performance and enhance business safety. This will bring great profits to the enterprises.Based on granular computing and other theories under its uniform framework, such as Rough Sets, Fuzzy Sets and so on, this dissertation focuses on several key techniques and new application areas of Web usage mining. The contribution of the dissertation is as follow:1. A new method of multi-granular user behavior data collection is proposed. The method uses a configurable plug-in embedded into Web Servers to collect user behavior data, which can combine with data of unique EC events and simplify the following pre-processing. It solves the Web log's problems of unreliability, single type and lack of the ability to integrate to other data of EC events. The experimental results show the method proposed is able to collect reliable data at low cost and provide high quality data sources for Web usage mining.2. Some methods are proposed to improve the pre-processing model. A new mixed method of online method and Web log complement is proposed for Web sites topology, so that Web site topology is achieved at most. A new Just Recently Used (JRU) algorithm is proposed. The algorithm uses new heuristic methods to complete the missing pages, which can reduce search space and the results are more reasonable and reliable.3. A knowledge granular based effective and complete attribute reduction algorithm is proposed for high dimension data in Web usage mining. The origin of the inefficiency of existing attribute reduction algorithms is studied and based on theory of granular computing the basic algorithms of indiscernibility relation and positive region computing are designed. Thus a complete and efficient algorithm for attributes reduction is proposed. In these algorithms dynamic SQL is used to directly get the sorted object sets so the sort algorithm and the incremental positive region computing algorithm can be omitted. Five new heuristic strategies are designed to select attributes to avoid useless attributes selected, reduce the search space and simplify intermediate results, which assure the completeness and efficiency of the algorithms. Theoretical analysis and experimental results show that the reduction algorithm proposed is more efficient than the existing ones and more adaptive to very large databases.4. A knowledge granular based high attribute dimensional sparse clustering algorithm framework is proposed. Based on this framework, two clustering algorithms for continuous date and discrete data are designed to analysis user characteristics. Through dimensional threshold vector, dimensional equivalent granular is sought leapingly and data needn't be changed to binary variable. Based on these, Initial equivalence relations are achieved. Then variable precision quadratic clustering model is designed to refine the result so that the algorithm gains noise resistance ability. A new clustering quantity evaluation model is defined facing the application field. The experimental results show the algorithms can provides results of various granular with high veracity and reflect the data characteristics.5. A behavior trust forecast and control model is proposed based on Bayesian network and behavior log mining. Currently, the methods for Web user behavior evaluation are at high cost and lack of feasibility. To solve this problem multiple data are extracted from user behavior logs as trust attributes. Thus the Bayesian network is built and the trust forecast and control algorithm is designed. An improved semi-fuzzy clustering is used to set and adjust the parameters of the model. So that the corresponding relationship is built between quantitive evidence and trust grade. The model can predict trust grade under the multi-trust-attribute conditions. The practical data have shown that multiple performances of the server are enhanced and the trade behaviors of users are restricted.
Keywords/Search Tags:Web usage mining, Granular computing, Attribute Reduction, Web user clustering, Behavior trust management
PDF Full Text Request
Related items