| Author name disambiguation aims at resolving the problem of multiple authors having the same name.With the massive growth of scientific and technological literature data,the problem of author name ambiguity is serious,especially the problem of Chinese authors losing tone after transliteration,the phenomenon of multiple homophones and characters,etc.the phenomenon of duplicate names is more serious,and the difficulty of disambiguation is also greatly increased.In view of this phenomenon,this paper studies the author name disambiguation method based on heuristic rules,and tries to apply the hierarchical clustering method to the author name disambiguation.Finally,according to the two methods,a hierarchical clustering method based on high contribution attribute is developed,which can solve the problem of Chinese author name disambiguation in English context,and 96.26% F1 is obtained in the test set of this paper It can be applied.The main contents of this paper are as follows:The author name disambiguation method based on heuristic rules is designed.Heuristic rules are used to identify authors with the same name from different perspectives and different feature attributes.This paper first analyzes the author’s information,selects some characteristics of the author’s attribute and the relationship between the authors,then disambiguates the different characteristics of the author,analyzes and summarizes the experience,and gradually improves the rules of each feature’s disambiguation,finally obtains new and better disambiguation rules,finally carries on the experimental design and evaluates the disambiguation effect of each feature and its rules Results: by evaluating the contribution of each feature to author name disambiguation,multi feature fusion was carried out to achieve the final goal of disambiguation.The author clustering strategy based on high contribution attribute is proposed.The traditional hierarchical clustering method and the author name disambiguation method based on heuristic rules are improved by using the method of multi feature step-by-step,hierarchical clustering and heuristic rules.In this paper,different similarity functions and clustering algorithms are used to disambiguate different feature attributes.Finally,the contribution of feature attributes to disambiguation is ranked.On this basis,the traditional hierarchical clustering method is improved,and heuristic rules are added to propose a hierarchical clustering method based on high contribution attributes.The core idea of this method is to select the attributes with high contribution to disambiguation in each clustering step.Firstly,the accuracy of single step clustering is guaranteed,and then multi-step hierarchical clustering is carried out through multi features,so that each step can be merged into more clusters,and the recall rate and operation efficiency of the algorithm are improved.Experiments show that this method has higher accuracy and better disambiguation effect. |