Font Size: a A A

Based On Semi-supervised Clustering Diagram Experts Disambiguation

Posted on:2014-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:W TianFull Text:PDF
GTID:2268330401973354Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
There are a large number of expert pages and open academic resources on the Internet, which have become important raw material for resource organization on expert retrieve. Due to duplicate names or diversity representations for experts, it is required to do expert disambiguation on expert page documents obtained. Expert name disambiguation is one of the basic and important research focuses of expert retrieve. The content of expert page documents, which has rich source, contains in addition to the article information published, research direction, projects undertaken, resumes, and there often exists implied relationship features among expert page documents, such as the link relationships, the relationship of containing the same attribute value and whether from the same site. These factors have a very good supporting role for expert-only-determined.Based on this, around the problem of which, considering less on expert attribute relationships in the traditional name disambiguation and ignoring the implied coreference relationship among expert pages. Specifically, we have mainly completed the paper with the following characteristics:1. A Chinese expert name disambiguation approach based on spectral clustering with the expert page-associated relationships is proposed. Firstly, the spectral clustering algorithm is introduced, and then the similarities of expert pages are computed according to expert attribute-associated relationships. Secondly, disambiguation model is constructed, which based on the spectral clustering with expert attribute associated-constraints. Finally, experiments are done on expert disambiguation corpus. The results showed that the spectral clustering on expert name disambiguation method with the expert page-associated relationships than that without the associated constraint information, the F-value has an average increase of6.7%.2. With the thought of graph clustering, an expert disambiguation method based on semi-supervised graph clustering is put forward. At the essence, this method is an improved way of spectral clustering method. Combined the experts attribute features with expert page-associated relationships,"must-link" and "cannot-link" constraint-based association rules are defined and the similarity matrix is built among expert page documents according to "must-link" and "cannot-link" constraints obtained, and next the disambiguation model is constructed based on semi-supervised graph clustering according to the characteristics of the "must-link"&"cannot-link" constraints and the standard function of normalized cut, and at final the solution process is discussed and the corresponding experiments are done. The result is that, compared to the spectral clustering method, the semi-supervised graph clustering approach achieves better clustering effect.3. Based on research this topic from both theoretical study and practical application, a prototype system framework is built by us. Disambiguation corpus is collected to do expert related experiments, and then an expert disambiguation prototype system is finished based on semi-supervised graph clustering.
Keywords/Search Tags:expert disambiguation, expert attribute features, expert attributeassociated-relationships, spectral clustering, semi-supervised graph clustering
PDF Full Text Request
Related items