Font Size: a A A

The Information Extraction Of Netdisk And Information Retrieval

Posted on:2018-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:J K DongFull Text:PDF
GTID:2348330518497008Subject:Information security
Abstract/Summary:PDF Full Text Request
The netdisk can not only supply users with plenty of storage, but also provide users with the facility to share data. Therefore it has brought great convenience to users, but it also becomes the important channel for malicious and infringing applications to spread. However, it's not easy for program to detect the applications because the links shared by netdisk are dispersive and the extraction of netdisk information will get into trouble with the input of Completely Automated Public Turing Test to Tell Computers and Humans Apart (CAPTCHA) which is difficult for program to accomplish. In order to solve this problem mentioned above,this thesis designs and implements the crawler to get the shared links,and extracts the application downloading links, and designs the process of recognition of CAPTCHA to solve the problem and at last designs and implements the retrieving service. The specific work is as follow:1. Design and implement the crawler system based on Webmagic.With the the crawling category designed, the new frame can be all-sided and convenient to crawl the application downloading links.2. Aiming at the problem of CAPTCHA in the process of extracting netdisk information, this thesis proposes and analyses the means of extracting the downloading link of one netdisk and utilizes the drop fall algorithm to solve the potential problem of the re ;ognition of merged characters. In the use of drop fall algorithm for character segmentation,this thesis proposes the combination of numbers of characters and minimum values to confirm the initial points Results show that the method of extracting link is quick and effective, and the modified algorithm can improve the success rate of segmentation and recognition.3. In the APK retrieval system, this thesis proposes that decompiling the APK files and extracting text and pictures from the decompiled APK files before indexing the APK files.
Keywords/Search Tags:Netdisk, Link extracting, CAPTCHA, Drop fall algorithm, APK information extraction
PDF Full Text Request
Related items