Research On Automatic Acquisition Technology Of Network Information For Key Targets

Posted on:2022-04-09

Degree:Master

Type:Thesis

Country:China

Candidate:Y S Jiang

Full Text:PDF

GTID:2518306572469444

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet,the Internet contains more and more information,and it is vital to obtain all the information of an enterprise or an individual on the Internet for the development of the enterprise or individual.For example,the basic information of an enterprise can be obtained through the company’s official website,the comments of the enterprise can be obtained from the public opinion platform,and the information about the enterprise spreading to all websites can be obtained through the search engine to the whole network.With the development of mobile Internet,information not only exists on the fixed Internet,and the information contained in the mobile terminal shall not be ignored.This paper takes the world’s top 500 companies as the key target,including research on search engine-based information acquisition and information processing of the entire network,research on information acquisition for specific websites,and research on mobile terminal information acquisition,and finally comprehensive The above research designs and realizes the key target network information automatic acquisition system,captures the full network information of the world’s top 500 companies,and provides data support for further analysis and governance of enterprises.This paper studies the web crawling technology based on search engines,including the retrieval of search engine data results through keywords and the download of web pages.A multi-mode based on text density and symbol density is designed for the downloaded web pages.Text extraction algorithm for text extraction.This algorithm adds extraction features on the basis of existing text extraction algorithms,so that the extracted text covers multiple forms of information.A k based on dynamic k value is designed for the extraction results.means text clustering algorithm for clustering,this algorithm is based on the traditional k-means algorithm,adding the uncertainty of k value to make the clustering result more accurate.Researched the crawler technology for specific websites,realized the acquisition of comprehensive corporate information with the “Fortune” website as an example for the industry’s comprehensive website,and realized the acquisition of corporate evaluation information with the Weibo platform as an example for the social networking website,and conducted a microblog simulation login Research.The mobile Internet crawler technology is studied,and the Tik Tok short video software is used as an example to achieve the key target mobile data acquisition through the Fiddler packet capture tool,and the text recognition technology to achieve the extraction of text in the video.This paper designs and implements a key target network information automatic acquisition system.The system contains three modules: search engine data acquisition,specific website data acquisition,and mobile Internet data acquisition,and it provides a display interface for the data acquired by each module.

Keywords/Search Tags:

Internet, mobile Internet, web crawler, text extraction, text clustering

PDF Full Text Request

Related items

1	Research And Implementation Of Text Clustering Algorithm For Internet News
2	Decision Support System For Investment And Regulatory Based On Internet News Text Mining
3	Research And Implementation Of Entity Relation Extraction In Massive Internet Text
4	Research Of Key Technology On Internet Public Opinion Monitoring System
5	Key Technology Research And Prototype System Implementation Of Weibo Public Opinion Monitoring
6	Research On The Application Of Text Classification And Clustering In Network Secutiry Operation System
7	Research On Key Problems In Text Mining
8	Research On Keyword Extraction Algorithm For Chinese Texts And Cluster Center Point Selection Algorithm In Text Clustering
9	System Design And Implementation Based On Crawler And Text Clustering For Network Public Opinion Analysis
10	Study On The Influence Of Internet-text To The Functions Of Internet-editor