Font Size: a A A

Data Mining Based On Web Log Analysis

Posted on:2017-05-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:C Y LiFull Text:PDF
GTID:1318330512954905Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the fast development of internet, our lives are closely connected with online activities; modern life has been inseparable from the existence of the network as internet is becoming one of the most important approaches to get information. Nowadays, shopping Websites, social Medias are more and more popular, because of the convenience, more and more people getting start to use internet during their social life which means lot of records are left on the interment. Thus, the internet server stored quite a lot of users’ information which we called Web Log. With such a valuable and unique resource, how to find the user’s intentions and preferences, analyzing the hidden information and knowledge, mining users’ behavior is one of the most important tasks faced by internet enterprise. Therefore, Web-based data mining came into being, by analyzing the Web log and user behavior patterns, the preferences and habits of end users could be generated. Accordingly, we could solve the problem mentioned above.Through the analysis of Web Log from a real estate Website (YF website) by this dessertation, the author using the data mining techniques such as statistical analysis, text mining association analysis, regression and clustering to obtain the valuable knowledge contained in the user log, Suggestions and proposals are summarized on optimize website, boost the performance of the website effectively, improve the structure of the website and enhance the user experience.First of all, the introduction part of this dessertation describes the relevant research background and current research situation of the topic between China and aboard, elaborates the main research content, the research innovations and the structure of this dessertation. Subsequently, the author summarizes the related theories, including Web mining, Web log mining, and real estate website log mining, and expatiate the origin, classification, process and application of Web log mining. At the same time, the research methods are described in detail, including statistical analysis, path analysis, correlation analysis, sequence analysis, correlation analysis and partial correlation analysis, regression analysis and cluster analysis.Secondly, the following four segments are involved by this research.1. Study of the user searching words.Firstly, preprocessed the [Query] field in the Web log and divided and sorted it into 18 variables, and then by using the statistical analysis method, we categorize the hot-hit key words, click ratios and visiting time, user’s choice of the living area (including:region, district, key words, tag, community), user’s choice of house type (Including:room, halls, minimum/maximum price, minimum/maximum average price, maximum/minimum area), as well as the user’s browsing behavior (including: time, date of access, validity, availability), and analyze the user’s hot demand, which provide the relevant basis to improve the user experience for the website, and also help the real estate business in the long term. In addition, through the analysis of the 1st user search behavior to the 12th user search behavior, it is found that with the increasing of the search times, the search precision is growing, and the search target is further clarified. Then base on the searching patterns, behavior patterns and other aspects, we use a detailed statistical method to compare the hotspot figures with real records to develop the deviation analysis between customer need and market supply.2. Study of the correlation between the variables of the user search and Web sequence association rules.We choose 11 variables select by the user from the [Query] field, by using the matrix to cross-check the relevance among each parameter the results are aggregated, it is found that there is a high correlation in variable:Room type and Area; significant correlation exists in four variables:Price, Area, Room type and Halls. Then we use the simple association rules (Apriori algorithm) to calculate the association relations among the variables, the result shows that there are five rules; furthermore, by using the sequence association method (Sequence algorithm) to define the user browsing rules, in order to find the user browsing preference which can be improved the Web structure setting and enhance the user experience.3. The impact factors of house turnover.In this part, the author focuses on the relationship between the second-hand house turnover in real life and the users’ visit of the website. First of all, to investigate impact elements of second-hand housing transaction and to verify the correlation among the each impact factors(including:users’ website visits, Financial institutions lending and deposit rates, consumer price index (CPI), new housing turnover, new housing prices, increasing area of new house, average price of second-hand housing), then remove the non-related impact factors (including: financial institutions lending and deposit rates, increasing area of new house), subsequently, using the multiple linear regression models to identify the regression coefficients of relevant impact factors, which reveals the influence degree of the impact factors of the house turnover. And to a certain extent, it is verified that user’s internet search behavior will have a practical impact on the social and economic behavior.4. Research on Clustering of User Types.Using the Kohonen neural network model, we cluster the user types by three dimensions:length of time that user stays [M], number of user clicks [O], depth of user visit the webpages [G]. The reason why to examine the data first is to perform clustering better, in this paper we using the Anomaly Index (AI) as the criterion to explore the outliers in the Web Log data. After remove the outliers, there are five different user types. As the standard of customer’s purchase or rent willingness, we named these five clustering as (window-shoppers, normal potential customers, valuable potential customers, focused customers and valuable customers). Meanwhile, by consolidating the characters of different Websites, we will have an obvious positive result of Kohonen network clustering. In so doing, it also raises the likelihoods of better customer experiences and offers the fundamentals for better housing marketing strategies.Last by not least, Conclusions and prospect, in this chapter, the author not only summarizes the achievements of this paper, but also gives the shortcomings during the research process and propose further improvements.
Keywords/Search Tags:Web Log Analysis, Association Rule, Regression Analysis, Clustering Analysis
PDF Full Text Request
Related items