Font Size: a A A

Research And Implementation On The Related Entities Query And Web Pages-Oriented Heterogeneous Information Network Construction

Posted on:2016-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:N LiFull Text:PDF
GTID:2428330542957293Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The clustering and classification,entity similarity analysis and link prediction based on heterogeneous information network are applied widely in the real life and scientific research fields owing to the excellent semantic expression capacity.As an extension of entity similarity query,the relevant entity query plays a pivotal role in the application of Web search and recommended system.However,how to build a high quality heterogeneous information network had been a prime problem owing to most heterogeneous information network extracted from Web pages.Moreover,existing query methods are based on the meta path framework,thus,it is difficult for users to select the reasonable meta path.Besides,the time complexity of calculating the degree of entity relevance is unacceptable.And there is no reasonable evaluation model for computing the relvence between entities.For these problems,this thesis does research on the method of heterogeneous information network construction on Web Pages-oriented and the related entity query method based on the heterogeneous information network.Firstly,a strategy of extracting the relation between entities based on rule is put forward to extracting the relations to solve the problem of the confused Web pages.And then a treble entity matching strategy cleans up the relations and does the matches for the entities.At last,a high quality heterogeneous information network is built by the relations.Firstly,the weight is defined reasonably according to the factors of semantic of edges,user preferences and features of graph structure.The Relsim,a computing model of entity relevance under the influence of comprehensive weight,is proposed,based on the improved SimRank algorithm which takes a good use of the semantic expression capacity of heterogeneous information network.Secondly,a naive related entity query method based on RelSim on heterogeneous information network is proposed.Furthermore a type of path pattern based selection algorithm is put forward for the disadvantages of the method,On the one hand,the algorithm can select the path space that conforms to the semantic,that is to say,to prune the graph reasonably,which could decrease costs of iteration greatly and the calculation time.On the other hand,the algorithm can help select significatice meta path to solve the questions of the seclection of semantic.Then,the complete top-k related entities query algorithm is achieved based on the path pattern based selection algorithm.Then,a great amount of experiments verify the performance and rationality of the computing model RelSim and top-k relevance query method RelSim-prune.The results demonstrate that the computing model RelSim could compare the relevance between entities effectively and top-k relevance query method RelSim-prune could improve the rate of calculation,which meet the needs of practical application.Finally,a related entities query and recommendation system REQR in heterogeneous information network is designed and implemented.The system integrates the function of data extraction and clean,entity recognition and match,entity query and recommendation.What's more,the system verified the validity of methods proposed in the thesis.
Keywords/Search Tags:heterogeneous information network, related entity, relevance, RelSim, RelSim-prune
PDF Full Text Request
Related items