Font Size: a A A

Research On Data Mining Technology Orienting Campus Web Log's Analysis

Posted on:2011-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2178330332488021Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Web usage mining is aiming at discovering the visiting patterns of users and predicting the users'visiting behavior by mining web log record,so as to achieve a better comprehension and service over the application based on Web.It's important for us to learn about the user's interests and analyze the browsing Patterns so as to rationalize the structure of websites and mine potentially commercial value. One of the solutions to these questions is employing traditional data mining techniques on web logs. That is to say,basing on the Principles and ideas of data mining,in accordance with the new characteristics of web logs, the traditional way of mining expanded and improved.The data of Web usage mining may stem from the server side, client side, proxy server, site files, registration information, or remote agent. Each type of data collection differs not only in terms of the location of the data source, but also the kinds of data available, the segment of population from which the data was collected, and its method of implementation. The entire Process of Web data mining and web log data mining is systematically introduced in this thesis. The work of preparation is needed to preprocess the data; its processes include data cleaning, user identification, session identification, and path completion. Web session consists of the sequences of web page accesses. Thereby, the similarity of web page-accesses is the base of the similarities of web sessions. In order to keep the users, website managers put similar contents in similar places as soon as possible when designing website structures, so we can observe its static similarities through the URL structure of web pages. In short, the results of data preprocessing will directly affect the results of data mining with different quality.In this article we take the website's logs of college sever as the research object and completed prototype. Firstly, after discussing conservative data preprocess method, improved methods have been researched and taking the advantage of ID3 algorithm is an effective method to preprocess the data. Secondly, we presented sequential scheme mining algorithm based on multiple Meta predict model and analysis its property and process experimental test. At the least, we make a series of experiment using the log. Based on the analysis of experimental data, we suggest some advice to this college. Some analyses and conclusion are offered in this paper, which shows that the algorithm is efficient and feasible.
Keywords/Search Tags:Web usage mining, preprocess, ID3 algorithm, sequential scheme
PDF Full Text Request
Related items