Enterprise Anomaly Detection Based On Multi-source Heterogeneous Data

Posted on:2022-10-09

Degree:Master

Type:Thesis

Country:China

Candidate:C Li

Full Text:PDF

GTID:2518306572450734

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of economic globalization,different enterprises interact with each other.It is very important for the relevant departments to supervise the market and the survival and development of enterprises to find the abnormal situation of enterprises in time.In the current research on enterprise anomaly detection,the research anomaly is mainly related to the enterprise's finance,credit and other anomalies.The data used in these studies are mainly structured and finance related data,or internal financial data of the enterprise,without making full use of a large number of unstructured data on the Internet.In view of this deficiency,this paper analyzes the unstructured data such as news text,resume,social network comments and so on in the Internet related to enterprises to find the relevant anomalies of enterprises.In this paper,first collect the data of enterprise related news,resume,social comments and so on.Then,according to the characteristics of the data content,five types of anomalies is defined: enterprise personnel,enterprise litigation punishment,enterprise competitiveness,enterprise cooperation and enterprise public opinion.Then,The relevant characteristics of the abnormalities of different enterprises are extracted from the collected data.For news data,this paper mainly analyzes the popularity of enterprises and extracts event from news.Due to the lack of data sets that match the event types needed in this paper,this paper uses an event extraction method based on template matching to extract news event information from news.Firstly,different event templates are defined according to event types,and then named entity recognition,coreference resolution,dependency syntactic parsing and other methods are used to extract events according to certain rules combined with event templates.For semi-structured resume data,this paper mainly through the words library of job,organization and so on,through the method of pattern matching to extract information from it.The main features extracted from resume are those related to entry and resignation of the personnel.For unstructured social data,this paper mainly analyzes the popularity trend of enterprises in social interaction and the emotional changes in social texts.In this paper,the rule-based VADER model is used to analyze the emotion of social text.After feature extraction,this paper uses the ensemble learning method LightGBM to build a detection model for each type of enterprise anomaly,and improves on the original method.Firstly,the model constructed by LightGBM algorithm is used as the base model,and the sample data is weighted according to the classification error rate.Then,the next model is trained with the weighted data,and the weighted prediction results of multiple models are used as the final result after multiple iterations.Through experiments,it is found that the prediction effect of the improved method is better than that of the original method.At the same time,experiments show that unstructured data in the Internet contains a lot of valuable information related to enterprises.By extracting information from these multi-source heterogeneous data,we can effectively find some abnormal conditions of enterprises.

Keywords/Search Tags:

enterprise anomaly, enterprise competitiveness, ensemble learning, LightGBM, multi-source heterogeneous data

PDF Full Text Request

Related items

1	A Research On Modeling And Design Of SOA-based Virtual Enterprise Multi-source Heterogeneous Service Integration
2	Research On Competitiveness Evaluation Of Chinese Listed Publishing Enterprise
3	Design And Implementation Of Enterprise Heterogeneous Data Integration&Query System Based On XML
4	Cement Enterprise Management Information System Design And Implementation
5	Combining Multi-source And Heterogeneous Data In Recommender Models And Systems
6	Research Of Unsupervised Anomaly Detection Methods Based On Ensemble Machine Learning Model
7	Research On Abnormal User Identification Of China Unicom Based On Ensemble Learning
8	Research On The Competitiveness Of Software Companies Under The Background Of Big Data
9	Research On Key Technologies Of Anomaly Detection Based On Multi-Source Data
10	Research And Implement Of Electric Enterprise Resource Plan System Based On J2EE