Font Size: a A A

Enterprise Anomaly Detection Based On Multi-source Heterogeneous Data

Posted on:2022-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:C LiFull Text:PDF
GTID:2518306572450734Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of economic globalization,different enterprises interact with each other.It is very important for the relevant departments to supervise the market and the survival and development of enterprises to find the abnormal situation of enterprises in time.In the current research on enterprise anomaly detection,the research anomaly is mainly related to the enterprise's finance,credit and other anomalies.The data used in these studies are mainly structured and finance related data,or internal financial data of the enterprise,without making full use of a large number of unstructured data on the Internet.In view of this deficiency,this paper analyzes the unstructured data such as news text,resume,social network comments and so on in the Internet related to enterprises to find the relevant anomalies of enterprises.In this paper,first collect the data of enterprise related news,resume,social comments and so on.Then,according to the characteristics of the data content,five types of anomalies is defined: enterprise personnel,enterprise litigation punishment,enterprise competitiveness,enterprise cooperation and enterprise public opinion.Then,The relevant characteristics of the abnormalities of different enterprises are extracted from the collected data.For news data,this paper mainly analyzes the popularity of enterprises and extracts event from news.Due to the lack of data sets that match the event types needed in this paper,this paper uses an event extraction method based on template matching to extract news event information from news.Firstly,different event templates are defined according to event types,and then named entity recognition,coreference resolution,dependency syntactic parsing and other methods are used to extract events according to certain rules combined with event templates.For semi-structured resume data,this paper mainly through the words library of job,organization and so on,through the method of pattern matching to extract information from it.The main features extracted from resume are those related to entry and resignation of the personnel.For unstructured social data,this paper mainly analyzes the popularity trend of enterprises in social interaction and the emotional changes in social texts.In this paper,the rule-based VADER model is used to analyze the emotion of social text.After feature extraction,this paper uses the ensemble learning method LightGBM to build a detection model for each type of enterprise anomaly,and improves on the original method.Firstly,the model constructed by LightGBM algorithm is used as the base model,and the sample data is weighted according to the classification error rate.Then,the next model is trained with the weighted data,and the weighted prediction results of multiple models are used as the final result after multiple iterations.Through experiments,it is found that the prediction effect of the improved method is better than that of the original method.At the same time,experiments show that unstructured data in the Internet contains a lot of valuable information related to enterprises.By extracting information from these multi-source heterogeneous data,we can effectively find some abnormal conditions of enterprises.
Keywords/Search Tags:enterprise anomaly, enterprise competitiveness, ensemble learning, LightGBM, multi-source heterogeneous data
PDF Full Text Request
Related items