Font Size: a A A

Research Of Disambiguation Of Internet People Information Technology

Posted on:2011-10-31Degree:MasterType:Thesis
Country:ChinaCandidate:E L MaFull Text:PDF
GTID:2178330338979986Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of Internet and its relative technology, the WWW has become the largest information area. For the enterprise or the individual, web becomes the main information source gradually. However, because of too many web sites and the information overflow resulting from this, it is more and more difficult to obtain useful information. When searching for person information, you will gain huge information and too much duplication, and the accuracy is not high. So, the person information extraction system is built to allow users faster and more convenient to obtain the required information, and the result simple, refined and beautiful.Because different people may be in different areas, so in this paper, according to this feather, the document information can be divided into seven categories, such as cultural, administrative, military, science, education, sports, health, economic. According to this method, we can avoid the same information processing between people of different areas and can improve the efficiency of the system. In addition, the pre-classification of this method can achieve high recall rate and we can ensure that the information of people in different areas will not crossover, so as to reduce the error rate that in the subsequent processing, the information of people in different areas my be classified into one group.In this paper, we implement the disambiguation processing by combining social networks and context information. If we only use the social networks or the context information, our disambiguation processing can not perform very well, because there will be only one people's name in the entire social networks or the social network is very small if we only use the social network and if we only use the context information, the context information of the document can not characterize characters very well, so we use this two methods to improve accuracy and the recall rate of the system. Using social network, we can achieve high accuracy, but the recall rate will be low, then use context information, we can overcome the disadvantage, and achieve good performance in both accuracy and recall rate.Character information processing system is a system that first run pre-classification according to the character information which collected through retrieving the character's name and using web crawler web crawling, then cluster using social network and context information, and finally display the network information according to different character entity in the interface of system.
Keywords/Search Tags:Disambiguation, Social Network, Area classification, Social attributes, Features Library
PDF Full Text Request
Related items