Font Size: a A A

Chinese Personal Name Disambiguation Based On Attribute Information

Posted on:2013-09-14Degree:MasterType:Thesis
Country:ChinaCandidate:L LiFull Text:PDF
GTID:2248330371966324Subject:Natural language processing
Abstract/Summary:PDF Full Text Request
With the development of the World Wide Web, the needs of network users and the search quality are rising, which names search requirement are growing up. Users gain people’s basic information, which they want to know. But the search result got by the current search engine is not very satisfactory. The personal name disambiguation is to disambiguate different personal information from mass of data.According to the current status of Chinese personal name disambiguation, we use the name attribute information to handle this problem. The main work includes the follow aspects:(1) The author studied some of the web cleaning algorithm, which is applied into our system in order to get the body of content.(2) This paper proposes a Chinese personal name disambiguation method based on personal attribute information. Firstly, we apply the bootstrapping algorithm to obtain a large number of high credible attribute templates which can obtain all kinds of attribute value. Secondly, we approve the dictionary matching algorithm by adding some constraints. It achieves high quality professional attribute and company attribute. Thirdly, the NER tools make use of getting relationship attributes. If there are still some texts without attribute value extracting by these three methods. This method employs Hownet to deduce attributes for texts without attributes.(3) By analyzing influence of different attributes, we use information gain to address attributes differentiation.(4) A "double thresholds" clustering algorithm is developed for addressing the characteristic of personal name disambiguation. During the clustering algorithm, it expands attribute values using TongYiCiCiLin to get more accurate similarity of documents’attribute values.Experimental results show that our method outperforms the-state-of-the-art methods. Our methods are more explanatory. Eventually, the result can also extract the structure information for each person.
Keywords/Search Tags:attributes extraction, attributes expansion, information gain, double thresholds, name disambiguation
PDF Full Text Request
Related items