Research Of Disambiguation Of Internet People Information Technology

Posted on:2011-10-31

Degree:Master

Type:Thesis

Country:China

Candidate:E L Ma

Full Text:PDF

GTID:2178330338979986

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of Internet and its relative technology, the WWW has become the largest information area. For the enterprise or the individual, web becomes the main information source gradually. However, because of too many web sites and the information overflow resulting from this, it is more and more difficult to obtain useful information. When searching for person information, you will gain huge information and too much duplication, and the accuracy is not high. So, the person information extraction system is built to allow users faster and more convenient to obtain the required information, and the result simple, refined and beautiful.Because different people may be in different areas, so in this paper, according to this feather, the document information can be divided into seven categories, such as cultural, administrative, military, science, education, sports, health, economic. According to this method, we can avoid the same information processing between people of different areas and can improve the efficiency of the system. In addition, the pre-classification of this method can achieve high recall rate and we can ensure that the information of people in different areas will not crossover, so as to reduce the error rate that in the subsequent processing, the information of people in different areas my be classified into one group.In this paper, we implement the disambiguation processing by combining social networks and context information. If we only use the social networks or the context information, our disambiguation processing can not perform very well, because there will be only one people's name in the entire social networks or the social network is very small if we only use the social network and if we only use the context information, the context information of the document can not characterize characters very well, so we use this two methods to improve accuracy and the recall rate of the system. Using social network, we can achieve high accuracy, but the recall rate will be low, then use context information, we can overcome the disadvantage, and achieve good performance in both accuracy and recall rate.Character information processing system is a system that first run pre-classification according to the character information which collected through retrieving the character's name and using web crawler web crawling, then cluster using social network and context information, and finally display the network information according to different character entity in the interface of system.

Keywords/Search Tags:

Disambiguation, Social Network, Area classification, Social attributes, Features Library

PDF Full Text Request

Related items

1	Privacy-protected Social Network Data Release For Avoiding Degree Attacks And Attribute Disclosure
2	User Hidden Attributes Inference And Attributes Cluster Analysis Based On Social Media
3	Research Of Social Network Information Propagation Model Based On Multi-dimensional Attributes
4	Investigation And Analysis Of Social Forces' Participation In Public Library Service Construction In Hefei Area
5	Research On The Social Memory Attributes Of Library Functions
6	Research And Design Of Classification In Social Learning Network
7	Key Technology Of User Social Characteristic Analysis On Online Social Network
8	Design And Implementation Of Social Network Data Retrieval’s Multi-Dimensional Sorting Optimizing Algorithm
9	Research On Optimizing The Performance Of D2D Communication Based On Social Attributes
10	Research On Social Network User Influence Analysis And Information Dissemination Modeling