Font Size: a A A

The Design And Implementation Of Multi-Source Information Monitoring System In Financial Field

Posted on:2020-09-28Degree:MasterType:Thesis
Country:ChinaCandidate:X N DuanFull Text:PDF
GTID:2428330572472217Subject:Computer technology
Abstract/Summary:
In recent years,financial risk incidents have occurred frequently,and the consequences have become more and more serious.Acquiring risk information in time means that there is more time to take countermeasures.As the main way for disseminating financial information,the Internet has the characteristics of high real-time and wide coverage.However,in the era of information explosion,it is impossible to rely solely on human beings to acquire and process information.Therefore,financial institutions and practitioners urgently need an efficient and flexible method for real-time monitoring of financial information on given multi-network sources,timely obtaining relevant risk information from a large amount of information and taking measures to avoid risks in advance.We carry out research on the current situation of low efficiency of financial information acquisition,and design and implement a multi-source information monitoring system in the financial field.The main research contents are as follows:(1)We analyze the requirements of the multi-source information monitoring system in the financial field and design the system ar-chitecture.The system is divided into three parts:the multi-source information acquisition subsystem,the financial risk event extraction subsystem,and the information real-time broadcast subsystem.(2)We design the multi-source information acquisition subsystem.We design and implement a distributed crawler tool based on Scrapy.By defining configuration files and parsing templates to customize crawlers,users can form a new spider easily.The multi-source information acquisition subsystem implements the distributed crawler through Celery,a distributed task queue,solving the problem of URL deduplication on distributed crawler system by Redis-based hash compression deduplication method and bloom filter deduplication method,and incremental crawling and breakpoint crawling are implemented.(3)We design the financial risk event extraction subsystem.We design and implement the data redundancy and standardization process,and the online-offline risk event extraction method based on trigger word expansion.Through the word2vec and LDA algorithms,the triggering word is extended offline.And the risk information identification plug and risk element extraction plug is running online.Besides,we implement the training and update of the CRF model to extract the named entities.(4)We design the real-time broadcast subsystem.We design and develop a risk event monitoring website for login,browsing,and querying,as well as a risk information publishing module based on subscribed tags.(5)We implement the multi-source information monitoring system in the financial field and perform unit testing and overall testing of the system.The system is running stable and has achieved the expected results.The system supports the efficient real-time collection and crawling customization on multiple information sources,and provides the financial risk event extraction method based on trigger word expansion,and implements the real-time access to risk information through monitoring websites and pushing e-mails,reducing the time required to browse and filter information from multiple sources and improving information acquisition efficiency.
Keywords/Search Tags:distributed crawler, event extraction, information monitoring system, web extraction
Related items