Font Size: a A A

Research On Distributed Information Collection Strategy For Financial Credit

Posted on:2017-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:T WangFull Text:PDF
GTID:2308330485971013Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Effective combination of Internet technology and the financial industry contributes to the rapid development of Internet finance. A large number of companies with Internet financial background have been established. Credit business, as the important part of financial companies, is the major source of profits. It’s important for financial companies to carry out risk control in order to ensure their efficient profitability. However, the data used in the traditional regulatory approach risk exists source of unity, timeliness and other issues so that financial companies’ performing loan ratio continued to increase. The continuous development of the Internet contributes to the quickly update of a growing number of business information, enterprise information, industry information, policy information, finishing processing of personal, business, industry, and other related information which financial companies are interested in, can help companies keep abreast of the loan related party information, so that it’s important to identify potential risks.Obviously, effective raw data has great significance to risk control. In the face of the explosive growth of Internet information, how to gather information from diverse sources precisely is a problem to be solved imperatively. Traditional methods of gathering information do not identify the validity of the information and gather all of them instead, which not only reduces the efficiency of crawling but also reduces the quality of the information. Under the background of the demand of credit information collection, the main research works are as follows:(1) Universal distributed crawler. Focusing on the compatibility issues of current distributed crawler system, this paper presents a universal, general-purpose distributed crawler system, leading to the compatibility of various crawling strategy, on the base of ensuring high-speed, efficient and accurate. The design of the distributed crawler mainly includes the following sections:the overall structure of the system, means of communication, each node’s functions, exception handling, and so on.(2) Effective method of gathering links for the page incremental crawls. Aiming at the weakness of traditional methods of gathering links, Web-based method is proposed to obtain valid links based on filtering invalid links. Taking into account the generality, we focus on extracting unstructured information on the main pages with common expressions (list), so that we can ensure the extraction accuracy as well as the reusability, for which we only need to change the data source to make the crawl work effectively in other applications.
Keywords/Search Tags:Distributed Crawler, Precision Acquisition, Finantial Credit, Effective Link Extractor
PDF Full Text Request
Related items