Font Size: a A A

A Recommendation System Based On Web Log Mining

Posted on:2009-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2178360272476623Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The continuous development of the Internet and the popularity of making the "information overload" and "lost resources" is an increasingly serious problem. Due to the existing search engine's inherent shortcomings, making retrieval of information less than satisfactory results. In that case, how fast, accurate access to valuable information network, how to understand the existing historical data and projections for the future, from these massive data found knowledge, which has resulted in the discovery and knowledge in the field of data mining emerged. Will be applied to Web data mining techniques in the field of Web mining will be born.The purpose of this paper is to design a Web log based on the recommendation of mining systems, cluster analysis technique will be used in the preparation and data page of recommended course of study, analysis of the realization clustering in the forecast recommend the application page. First of all, on the Web log data mining in the pre-process (including data clean, user identification, identification of the session, the path to add services and identification, etc.) were studied. Analysis of the various steps in the end, the methods are given for each step in the realization of the algorithm; Second, the user from the cluster, the cluster page and frequent access paths to consider three-mode web browser, some related to the definition given in the original clustering algorithm based on the application of vector-based and fuzzy set theory and algorithms for user page Effective clustering, and have frequent access path for users to personalize the recommendation; Finally, the study design based on the realization of Web logs mining based on the recommendation of the system..Web log mining is to study the user's Web browsing behavior of the main techniques and tools to understand the user's interest in the visit is to improve the quality of service and the Web site to improve the design of the structure of an important link. Web log of the pre-excavation process include: clean data, user identification, identification of the session, the path to add services and user identification. Data purification: Web log file data cleaning, digging has nothing to do with the deletion of data; identification session: In time across a larger section of the Web server logs, the user may visit the site many times, the purpose is to record the user's access into a single conversation in order to get closer to the user's browsing habits; to identify things: get through a few steps before The user sets the operation sequence in theory have been able to carry out data mining, but the size is too thick, is the use of the Service to identify its segmentation into the affairs of the smaller model; Path to add: As the user's browser Back button out of the local cache of the page, the log records will be missing some pages, the path is to add the missing pages to add to the path; users to identify things: the user is to identify things through Browser to visit one or more individual servers. As the local cache, proxy servers and firewalls exist to identify users become very complex, inspired by the general algorithm.Web log mining is an important research direction is the Web Service users and clustering cluster URL, received a similar visit to the interest groups of users and users of common interest, the site URL, which can identify and adjust the structure of the site and Personalized service. Clustering Services for users, Fu used, such as hierarchical clustering algorithm (BIRCH), using a generalized approach to the use of Web services cluster. Shahabi, such as using K-means clustering algorithm to the user services, the introduction of its path and the path of the characteristics of the space point of view the concept of access is users vector elements of the affairs of the page appears the number of times, take the measure of the similarity of the two vector point of the visit of the value of cos . URL for the site cluster, the Su-based medium-density recursive clustering, clustering in the process of changing the dynamic parameters of clustering to improve the results of the cluster URL. Perkowitz, and other PageGather clustering algorithm used in the Web site's URL to find the relevant page collection, first of all to build a similar matrix, the matrix elements from the Web page of the log has been jointly by the frequency of visits, and then the matrix on the basis of URL Clustering. However, the current study, there are some less, in the first cluster of similarity measure, simply to visit or visits to measure, the Web site of this complex situation, the cluster is not accurate. In addition, they are using the traditional clustering technology, each object that is strictly divided into a category, can not deal with the problem of overlapping between the categories. Vector-based clustering algorithm in a given time period, all users come to visit the Web site page clicks. And matrix organization in the form of the message. As a user line, as a page out, the number of clicks as the matrix element of value. Vector lines represent not only the structure of the site, users have the implication of a common mode of access, and it will be good for out vector reflects the type of users, but also outlines the user's personal visit to sub-plans. In that case, vector lines and the measure were out of the vector similarity can be directly related Web pages and similar user groups, further analysis can be user access model, that is the path to visit frequently.Based on the above-mentioned research, design and development of this article based on a Web log mining personalized recommendation system. The system by analyzing the Web server's log files, visit the Web site users have found that the browser model for the site administrator will help to provide a variety of web sites or to improve the economic benefits. Web log mining process generally consists of four parts: data pre-processing, mining algorithm, the model analysis, visualization. Personalized technical services for different users of different services to meet different needs. Personalized service through the collection and analysis of user information to the user's interest in learning and behavior in order to achieve the purpose of the initiative recommended. Personalized service technology to improve the site's full quality of service and efficiency of the visit in order to attract more visitors. Off-line for some of the major mining properties of the user's information can be personalized as part of preparations for the service. There are two stages. The first stage is the raw data and related data pretreatment. High-quality decision-making will inevitably depend on high quality data, data pre-processing is an important step in Web mining. As part of the online recommendation engine, the main recommendation in order to provide users with personalized service. The recommendation engine is based on the task of server clustering users log page, and users of the current affairs similar to the cluster of recommended pages to users.Based on Web Log Mining personalized recommendation system consists of three main modules: data pre-processing module, the module pattern mining, real-time recommendation module. The main function: to set the data source, database function is to set up local connections; import log, is a function of the server log files into the database; URL-category, is the function of the database of data-URL category; users Poly category, is the function of the data in the database user-category; poly-type result, the function is to view the URL, or poly-type user-category.In this paper, the realization of personalized recommendation system has the following characteristics: First, the interface clear and easy to use. Using the system, such as text and graphics on a variety of means of visual data mining of expression and interpretation of the results to help users understand intuitively understand the results of the excavation. Some of the off-line every step of the process of excavation are given explanations for the use of Web site administrator. According to some of the current online users access path from the mining algorithms to provide automatic recommendation of the page. Second, full-featured. Of the system to meet the user's personalized Web services for new users and fewer users browsing the site can also provide an accurate recommendation service. There are also less than the present system, only the text of Web logs mining have been studied, did not address the content of Web mining and the Semantic Web. Due to the small amount of data in the application of clustering is more appropriate for large amount of data mining at the time of implementation is also necessary to improve efficiency. Due to time constraints, the text of personalized recommendation from the practical application of the system is still great distance, there are many links for further research and refinement.
Keywords/Search Tags:Web Log Mining, Personalized Recommendation, Cluster
PDF Full Text Request
Related items