Font Size: a A A

User Features Analysis Using Internet Access Logs

Posted on:2015-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:L L ZhengFull Text:PDF
GTID:2268330425982052Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of information technology, the Internet has become an indispensable source of information. The explosive growth of Internet information resources led to the growing problem of information overload, the amount of information on the Internet far beyond the needs of Internet users, a lot of irrelevant information seriously interfere with the user precise choice of useful information. Contradictory mass of information resources and limited supplies between user needs, we must find a way to make quick and accurate method to find the desired information from the vast information resources. It is in such demand-driven, personalized service technology emerged, personalized service is a targeted approach to service, according to various sources of resources to collect, collate and classification, providing information and recommendations of interest to the user to meet the needs of users.As one of the core objectives of personalized service and technology, the user is to analyze the characteristics of user interests, behavior and other characteristics of information, quality of service characteristics of users are largely influence and determine the exact personalized service provided by the system.Operators to provide users with Internet access services, they tend to store the user’s access log data. These access logs feature rich user information. In this paper, an operator of internet access log data, analyze them by digging derive the features of interest of the user. The main results of the work of this paper has the following lour aspects(1) A parallel feature item extraction based on user’s MapReduce algorithm. The algorithm to access the page content based on the user, based on entries in the right to revisit the document to remove user characteristics keywords. This paper describes the design of the algorithm in parallel, and in I ladoop has been realized.(2) gives users with similar characteristics mining algorithms. The algorithm for clustering user access page, and then calculate the user’s interest based on the results of clustering feature similarity. In this paper, MapReduce. Mahout and Hive proposed algorithm parallelization implementation strategy on Hadoop platform implements the algorithm.(3) proposed an independent user identification algorithm based Internet access logs. The algorithm utilization of the access log analysis of user access to the law field IP, User Agent, Cookie, etc.. using the first segment and then merge the idea of access log analysis. According to this idea, this paper identifies separate browser logs, and then combined into a separate browser logs the user account log by association, to achieve the identification of the user.(4) gives the system design and implementation features of Internet-based user access log analysis. In the design of this article, characteristics of the system by the user logs pretreatment, text preprocessing, user characteristics analysis of three modules. This article details the design of which the function of each module and sub-module and its implementation.
Keywords/Search Tags:user features, web log, text mining, user identification, hadoop
PDF Full Text Request
Related items