Font Size: a A A

Enterprise Behavior Recognition And Behavior Relationship Discovery Based On Massive Text

Posted on:2020-07-29Degree:MasterType:Thesis
Country:ChinaCandidate:X W DengFull Text:PDF
GTID:2428330596475306Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Entities,in natural language processing,are the objects the documents describe,such as enterprises in business news,scenic spots in a travelogue.Entity relationships hidden in texts are of great value.For example,corporate relationships can assist decision makers with business decisions and scenic spots relationships can help with recommendation systems.Thus discovering entity relationships from texts is a valuable job.In the existed researches of entity relationship extraction,researchers mainly rely on the co-occurrence of entities.The extracted relationships are heterogeneous,that is,the extracted relationships are not necessarily the same type.But the closer homogenous relationships can better reflect the associations of entities.Suppose we build enterprise relationships based on their certain specific behavior,rather than the fact that they appear in the same piece of news.Then in constructed relationship networks,the association of entities will be closer,which can improve the follow-up works,e.g.classification or prediciton.Compared with the heterogeneous relationship,the extraction of homogenous relationships mainly faces the three challenges: First,the existence of homogenous relations in the Internet texts are sparse.Second,the entitiese involved in the texts are unknown.Finally,the set of relationships contained in the corpus and the corresponded label of each document are unknown as well.In response to the above challenges,this paper predicts corporate relationships in massive texts of Internet.The two problems to be solved first,are entity extraction and behavior extraction,and based on these two problems,to model homogenous relationships of entities and use the model to predict the relationship between unlinked entities.The first issue of this paper is aimed at entity identification,to calculate the possibility that the character(word)elements in the texts make up the entity name.This paper proposed a method for measuring the distribution of candidate samples,divide them into four groups according to their distribution and use more than 80 kinds of associations for identification.Experiments have verified that the proposed grouping strategy can improve the accuracy of compound word recognition.In addition,for the group with poor recognition performance,this paper proposed the AMIS algorithm to improve too.The second issue of this paper is behavior extraction.The author proposed a framework for "cluster + annotation + classification".Experiments show the proposed method is superior to other methods in terms of behavior recognition.In addition,the proposed framework can also recognize the rare behavior.Based on the recognized entities and behaviors,the author used the entities as nodes,the specific behavior as edge,to build a network of behavioral relationships.The third issue of this paper is the possibility of establishing relationships(edges)between firms that did not have a relationship(edge)before,given the network of relationships.This paper used the Node2 Vec model to train the node vectors and train a more suitable node vector model for prediction.Experiments have also confirmed that Node2 Vec is indeed more effective than traditional features.In general,the main contribution of this paper is to provide a technical route for us to mine and use enterprise homogeneity from Internet corpus.In this paper,the grouping strategy and AMIS algorithm proposed for entity name recognition,the framework of "cluster + label + classification" proposed for enterprise behavior recognition,and the Node2 Vec model for improving relationship prediction,offer good solutions to the key steps of the technical route.
Keywords/Search Tags:enterprises relationship, compound words, behavior extraction, relationship discovery
PDF Full Text Request
Related items