Research On Distributed Information Collection Strategy For Financial Credit

Posted on:2017-04-19

Degree:Master

Type:Thesis

Country:China

Candidate:T Wang

Full Text:PDF

GTID:2308330485971013

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Effective combination of Internet technology and the financial industry contributes to the rapid development of Internet finance. A large number of companies with Internet financial background have been established. Credit business, as the important part of financial companies, is the major source of profits. It’s important for financial companies to carry out risk control in order to ensure their efficient profitability. However, the data used in the traditional regulatory approach risk exists source of unity, timeliness and other issues so that financial companies’ performing loan ratio continued to increase. The continuous development of the Internet contributes to the quickly update of a growing number of business information, enterprise information, industry information, policy information, finishing processing of personal, business, industry, and other related information which financial companies are interested in, can help companies keep abreast of the loan related party information, so that it’s important to identify potential risks.Obviously, effective raw data has great significance to risk control. In the face of the explosive growth of Internet information, how to gather information from diverse sources precisely is a problem to be solved imperatively. Traditional methods of gathering information do not identify the validity of the information and gather all of them instead, which not only reduces the efficiency of crawling but also reduces the quality of the information. Under the background of the demand of credit information collection, the main research works are as follows:(1) Universal distributed crawler. Focusing on the compatibility issues of current distributed crawler system, this paper presents a universal, general-purpose distributed crawler system, leading to the compatibility of various crawling strategy, on the base of ensuring high-speed, efficient and accurate. The design of the distributed crawler mainly includes the following sections:the overall structure of the system, means of communication, each node’s functions, exception handling, and so on.(2) Effective method of gathering links for the page incremental crawls. Aiming at the weakness of traditional methods of gathering links, Web-based method is proposed to obtain valid links based on filtering invalid links. Taking into account the generality, we focus on extracting unstructured information on the main pages with common expressions (list), so that we can ensure the extraction accuracy as well as the reusability, for which we only need to change the data source to make the crawl work effectively in other applications.

Keywords/Search Tags:

Distributed Crawler, Precision Acquisition, Finantial Credit, Effective Link Extractor

PDF Full Text Request

Related items

1	Research And Design Of The General Crawler In Search Engine
2	Research On Topic Focused Web Crawler And Related Technologies
3	Research On Technologies Of Distributed Link Extraction And DNS Cache
4	Design And Implementation Of An Intelligent Credit Rating Data Acquisition System
5	High Speed And High Precision Data Acquisition Technology Research
6	Design And Implementation Of Mobile Data Service Precision Marketing System For Operators
7	Research On Precision Marketing Of Bank Credit Cards Based On Data Mining Technology
8	Design And Implementation Of Distributed Web Crawler Based On Groovy
9	High-speed High-precision Data Acquisition System Is Developed
10	Design And Implementation Of Distributed Web Crawler System Supporting Dynamic Web Pages Paring