Font Size: a A A

Research And Implements Of Entity Resolution Based On Lda

Posted on:2014-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:T T ZhangFull Text:PDF
GTID:2248330398471899Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, the ways of data acquisition increase diversely.Data quality problems has been paid to more and more attention.The ambiguity in data is a concerned data quality problem.In one or more databases,a real world entities may have multiple description method or Multiple entities have the same description method.Entity ambiguity exists in different areas,such as academic network,keyword based retrieval data, e-mail, name ambiguity in movie database ambiguity in relation database records.The LDA model is a generative model for modeling text document collection and the other set of discrete data and mainly used in topic discovery of text processing.The LDA model was extended forward LDA entity resolution model.The entity resolution problem is transformed into a problem of probability. The new model add an additional layer comparing with the original LDA three layer model by modifying the attributes of the entity to get acquire entity references.Based on the previous research of LDA inference,1implement reference of the LDA entity resolution using Gibbs Sampling.And a simple method to estimate the number of entity has been proposed.The method is based on blocking technology and split the entity references into different buckets.It not only can estimate the number of entity,but also can reduce the calculation of the entity reference comparison.Two widely used entity resolution method are implemented to verify the efficient of entity resolution method based on LDA model,the method based on clustering and the method based on social network analysis.The experiment on webpage data and academic network verify the effect of the three methods using different evaluation.Experiment results show that, compared to the other two methods,the LDA entity resolution model gets relatively high accuracy, and achieve good result.
Keywords/Search Tags:entity resolution, LDA, entity relationship, Gibbs Sampling
PDF Full Text Request
Related items