Font Size: a A A

Research And Implementation Of Entity Resolution For Chinese Scientific Research Institutions

Posted on:2020-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:G C SongFull Text:PDF
GTID:2428330575457116Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Entity resolution refers to matching and merging data records in the data that refer to the same entity in the real world.It is a key step in data cleaning and data integration.Traditional entity resolution research work is mainly to resolute data records in one or more databases.Web page data has unstructured,unformatted features,and the data problems in real data are diverse.This paper builds a scientific research knowledge base by crawling and parsing the paper data on Wanfang.It has different application backgrounds from traditional entity analysis.This paper first analyzes the characteristics of existing entity resolution algorithms,and proposes an entity resolution algorithm model based on text matching.The similarity calculation of long text semantic attributes is regarded as text matching problem,and the text matching model is adjusted according to the entity parsing application scene.Combining the character level similarity measurement method,the semantic level and the character level similarity measure are applied to different types of attribute values respectively,and the data is parsed by the entity.The experiment proves that the method is effective.Then it analyzes the characteristics of scientific research data crawled in the actual website,and proposes a multi-feature fusion entity analysis algorithm combining attribute,structure and semantic information.Climb scientific research paper data,label data to build data sets.Research and experiment on the similarity of institution name similarity,relationship network similarity and research field similarity,design the name attribute word algorithm of scientific research institutions,construct the research institution relationship network,and extract the information of high quality papers by scientific research institutions.Realize the expression learning of the research field of scientific research institutions,vectorize the research field information of scientific research institutions,and prove that the combination of these three characteristics can effectively solve the problem of diversification of problem types in actual data.Finally,the entity analysis algorithm of this study is applied to the actual system,and the scientific information knowledge base is constructed to provide data support for the upper application.Based on the fact that data dispersion is difficult to complete entity parsing at one time,an entity parsing process for offline iterative aggregation is proposed.
Keywords/Search Tags:Entity resolution, knowledge base, text matching, multi-feature fusion
PDF Full Text Request
Related items