Font Size: a A A

The Designation And Implementation Of Business Insight System Base On Web Content

Posted on:2018-05-19Degree:MasterType:Thesis
Country:ChinaCandidate:H R LingFull Text:PDF
GTID:2348330518995694Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Internet era is the era of information outbreak, people can browse a variety of network resources, create their own unique browsing habits.For a single user, the collection of network resource information that he had visiting represents to some extent his browsing habits and hobbies.The general approach to these logs is to use DPI technology for conventional field statistics, not to analyze the specific content within the message, or for content analysis is limited to the URL of the page content of the target text, ignoring the URL resources, background knowledge and structural features and many other factors, will eventually lead to lower accuracy of content analysis. The background information of URL resources is also used as the raw material of analysis, combined with the multi-level structure characteristics of URL and the characteristics of web pages, the method of information extraction and analysis of Web content(The Web page and URL) has become the research focus.This paper focuses on the background and requirements of network operators for business insight, and research the related technology solutions needed to realize Web-based business insight. Finally, finishing the work of the design and the development of the business insight system based on Web content. The main research contents are as follows:1. News, video, ecommerce of different types of web page content extraction. This paper analyzes the structure of different types of Web pages and design and implementation of different types of Web content extraction methods,and ultimately used in URL analysis and Web content analysis and other functional modules;2.URL tag information acquisition.This paper analyzes the structure and background of the URL, and summarizes a method that can identify the URL information and automatic management of the information; 3 .Researching for the platform architecture program of the system. This paper realizes the multi-level tagging method of webpage information, a method of splitting a URL into multiple fields and categorizing and parsing the contents of each field and the approach of searching through the network resource to match the information, and at last, the validity of these methods is verified by testing.Based on the achievement of the above key technical solution, this paper completes the development of business insight system based on Web content, the system implements the functions of URL analysis, Web page classification, Web content analysis and rule management according to the request URL field set in the user network access log, transforming the URL field set into user behavior characteristic information provides a basis for user feature extraction, and provides prerequisites for service providers such as network operators to conduct business insight for users.
Keywords/Search Tags:url analysis, page classification, web content extraction, business insights, web page tags
PDF Full Text Request
Related items