Font Size: a A A

Research On Corporate Internet Negative Information Capture

Posted on:2019-12-01Degree:MasterType:Thesis
Country:ChinaCandidate:N D WangFull Text:PDF
GTID:2428330548481889Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the globalization of the Internet in the information era,the arrival of the age of big data artificial intelligence,people,and the concept of "data ownership" to "data creation" have started to reflect on the value 'creation in data mining in existing industries.In all industries,financial industry practitioners are eager to promote economic development and value return from data.Timely and accurate Internet data is of strategic significance to bank risk control.In the era of rapid Internet development,how to accurately collect and analyze the intricate information data of lenders based on their own needs is an urgent problem.The external data source approach supplements the bank's first-time understanding of the relevant information of the lenders,and timely screening and early warning of potential risks is of great significance to improving the level of risk management.The traditional method of information collection is a method of "not to be denied".Information is not extracted after the information is screened to obtain the information.The information is not only of low quality but also has low efficiency in crawling data.The cost of late data processing is also quite large.In view of the above issues,this paper has started from data source acquisition,data collection efficiency,data preprocessing,and data storage and storage.The full-text work can be divided into the following three parts:1)Chinese company abbreviation generation and detection.A new machine learning method based on double-level conditional random field join rule derivation and web crawler inspection was proposed.By constructing a double-level conditional random field model,the classification of each word within the company name is identified,a feature set is constructed and a CRFs model is input,and the abbreviation obtained through the output is collected by a web crawler for statistical evaluation.There is a certain practical application value for accurately generating the abbreviation describing the related company.2)Corporate negative information collection solutions.Through the use of the"spread first product" mode to collect information on the requirements for extraction,the entire network is used to collect crawlers.Subjects related to the basic company name have the option of crawling predefined web pages that match the theme,and then Incremental crawlers are used to generate a directional crawler strategy based on the topic of different corporate negative information.A large number of machine learning algorithms are used to preprocess the collected data such as deduplication,denoising,and screening.3)Designed and implemented a corporate negative information collection system.This system is a sub-project service for risk management and monitoring platforms of bank venture capital business personnel.Users interact with the risk warning platform to send information acquisition requirements to the value acquisition system,and then the dispatch center analyzes the tasks and delivers the collection tasks.Finally,it collects collected data for pre-processing analysis and provides risk analysis system data support.
Keywords/Search Tags:Bank venture, Abbreviation, CRFs, Information Collection
PDF Full Text Request
Related items