Font Size: a A A

Big Data Entity Recognition Based On Graph Clustering Methodological Study

Posted on:2020-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:R SuoFull Text:PDF
GTID:2428330596470942Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Nowadays,in the era of big data,the amount of data has exploded,and the growth of data volume has brought serious data quality problems,which has greatly reduced the availability of data and made data cleaning more important.Entity recognition is an important step in data cleaning.The main purpose is to accurately identify the same entity and to associate the data object with the real entity in the real world,that is,whether the tuple pair in the database refers to the same entity.In this way,redundancy is eliminated and the inconsistent data cleaning effect is eliminated.Data identity can be effectively improved by entity identification.Entity recognition method for big data is one of the hotspots of current research,but the current entity identification method for big data is still not satisfactory in recognition efficiency.The current entity recognition technology is based on domain knowledge and dependence on domain knowledge.Very high,domain-independent entity recognition algorithms are currently few.The entity recognition algorithm based on graph clustering is currently better,and the Spark computing platform has great advantages in big data processing.Therefore,this paper is based on graph clustering.Based on the recognition algorithm,the Spark computing platform is used to propose an entity recognition algorithm based on hypergraph clustering.This paper first introduces related technologies and theoretical foundations for entity recognition,including block technology for entity recognition and entity recognition clustering technology.Then,by using the hypergraph clustering method,the entity recognition algorithm under the hypergraph model is designed and implemented.Firstly,the data is segmented by establishing the inverted index table and the frequent item set mining method,and the super graph model is constructed.Prepare;build a weighted hypergraph model by mining good frequent itemsets,and transform the data into a hypergraph;finally,optimize the traditional hypergraph clustering method to perform hypergraph clustering to complete the recognition of the same entity.Finally,the proposed algorithm is experimentally verified.The standard evaluation method is used to analyze the algorithm from three aspects: accuracy,efficiency and speedup.The final experiment proves that the proposed method can improve the recognition efficiency of big data entities and is accurate.Sexually has a good performance.Applicable to the current entity identification work under large data volume.
Keywords/Search Tags:entity recognition, big data, hypergraph clustering
PDF Full Text Request
Related items