| The phenomenon of person name ambiguity is widespread on web pages in that onename may be used by different people. Chinese names search is one of the daily needs ofInternet users. With the development of the Internet, Web pages because of character with thesame name, increasingly prominent reading comprehension difficulties had a negative impact,especially to search engines. The popular general search engines ambiguous names bykeywords only match, output long unordered list. The truly valuable information is only thetip of the iceberg "in the massive Web data. The pages of "celebrity" submerged"non-celebrity" phenomenon, brought a great deal of inconvenience for users to find thecharacter information they need.To solve the problem, we devote the research on the Chinese names search. The maintasks of this paper are as follows:In the first, we study on the basis of the vertical search engine technology, combinedwith the characteristics of Chinese names search, designed the Chinese names of the searchengine architecture, its main part is the web names theme crawler design and web Chinesename disambiguation. Web names theme crawler is using template-based and web-basedDOM tree analysis methods, respectively, from the Baidu figures Encyclopedia acquisitioncharacter information to the establishment of bibliographic databases, and Internet collectioncontains ambiguous names Web page.This paper gives a method based Baidu Encyclopedia of unsupervised automaticallynames disambiguation. Baidu figures Encyclopedia of massive data as the basis ofbibliographic databases, resolve its wealth of character information and semantic relationsextracted from the three major characteristics of the character background, character featuresof context, people group information. According to the logistic regression method of learningfeature weights and linear integration, select the entity corresponding to the maximum figurewithin the meaning of the characters as ambiguous names.Finally, the establishment of experimental prototype and carried out the experiment ofthe Web Chinese personal name disambiguation, disambiguation effect to verify the validityof the method. |