Font Size: a A A

Research On Automatic Acquisition Technology Of Network Information For Key Targets

Posted on:2022-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y S JiangFull Text:PDF
GTID:2518306572469444Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the Internet contains more and more information,and it is vital to obtain all the information of an enterprise or an individual on the Internet for the development of the enterprise or individual.For example,the basic information of an enterprise can be obtained through the company's official website,the comments of the enterprise can be obtained from the public opinion platform,and the information about the enterprise spreading to all websites can be obtained through the search engine to the whole network.With the development of mobile Internet,information not only exists on the fixed Internet,and the information contained in the mobile terminal shall not be ignored.This paper takes the world's top 500 companies as the key target,including research on search engine-based information acquisition and information processing of the entire network,research on information acquisition for specific websites,and research on mobile terminal information acquisition,and finally comprehensive The above research designs and realizes the key target network information automatic acquisition system,captures the full network information of the world's top 500 companies,and provides data support for further analysis and governance of enterprises.This paper studies the web crawling technology based on search engines,including the retrieval of search engine data results through keywords and the download of web pages.A multi-mode based on text density and symbol density is designed for the downloaded web pages.Text extraction algorithm for text extraction.This algorithm adds extraction features on the basis of existing text extraction algorithms,so that the extracted text covers multiple forms of information.A k based on dynamic k value is designed for the extraction results.means text clustering algorithm for clustering,this algorithm is based on the traditional k-means algorithm,adding the uncertainty of k value to make the clustering result more accurate.Researched the crawler technology for specific websites,realized the acquisition of comprehensive corporate information with the “Fortune” website as an example for the industry's comprehensive website,and realized the acquisition of corporate evaluation information with the Weibo platform as an example for the social networking website,and conducted a microblog simulation login Research.The mobile Internet crawler technology is studied,and the Tik Tok short video software is used as an example to achieve the key target mobile data acquisition through the Fiddler packet capture tool,and the text recognition technology to achieve the extraction of text in the video.This paper designs and implements a key target network information automatic acquisition system.The system contains three modules: search engine data acquisition,specific website data acquisition,and mobile Internet data acquisition,and it provides a display interface for the data acquired by each module.
Keywords/Search Tags:Internet, mobile Internet, web crawler, text extraction, text clustering
PDF Full Text Request
Related items