Research On Automatic Person Attributes Extraction And Social Network Construction From Wikipedia

Posted on:2012-02-15

Degree:Master

Type:Thesis

Country:China

Candidate:X P Meng

Full Text:PDF

GTID:2248330395958158

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

People search is one of the most important search activities. People search engines, social networks, and other associated applications have gradually become research hotspots. Person attributes extraction is an important basis of these studies. The paper mainly focuses on person attributes extraction from Wikipedia, and then constructs a similarity network using person attributes and other information in the Wikipedia text.Infobox in person text summarizes the main attributes of the person in the form of table, which provides an important resource for person attribute extraction. However, less than forty percent of the Wikipedia texts contain infobox, and some attributes are missing. Therefore, how to automatically generate infoboxes and fill the missing attributes becomes one of our study contents.Wikipedia texts have different kinds of infobox templates, and different infobox templates may contain different kinds of attributes. Therefore, infobox template type must be determined before attribute extraction. We take it as a typical text classification task, and text category labels are infobox templates types. In feature selection, a method based on hyperlink words, text categories and entity words is proposed. Compared with using all the words for features, experiments show that the classification performance of proposed method has certain advantages.In the task of attributes extraction, we use "person-attribute-value" triples extracted from some existing infoboxes. For a given attribute, our system marks the person name and the attribute value in the corresponding sentences in free texts of Wikipedia, and automatically acquires marked data set. Patterns of each attribute can be generated automatically by machine learning algorithms. Then more attributes can be acquired by means of pattern matching, and at the same time the attributes can be used to generate infobox or fill the missing attributes. We do experiments for the five commonly used attributes. The result showed that our method could extract person attributes effectively.Afterwards, we mine a similarity network using the extracted person attributes and other information in Wikipedia text. Firstly, information about person is divided into different properties and Person Model is proposed. Then, for different dimensions, different similarity calculation methods are used. Finally, for the total similarity of person model, we take person entity as a system, so systematic similarity measure can be employed. Moreover, we define four types of relations. For two given persons, not only can similarity between two persons be gotten, but also relation and their common value can be output. Through experiment on the real person data in Wikipedia, we analyze the distribute feature of social network and the method is proved to be feasible.

Keywords/Search Tags:

Wikipedia, pattern automatic acquisition, person pttribute extraction, personmodel, social network

PDF Full Text Request

Related items

1	Mining The Quality Of The Content In Wikipedia
2	Research On Wikipedia-based Social Network Analysis Technique
3	Research On Automatic Terminology List Construction Of Documents
4	Research Of Named Entity Recognition And Automatic Pattern Acquisition In Information Extraction
5	Research Towards Web Classification Based On Wikipedia Category Network And URL Pattern Tree
6	Social Network Gray Hat User Detection Based On Diffusion Pattern
7	A Trust Model For Wikipedia
8	Found The Blog Knowledge-based Information Extraction Technology
9	Research On Personal Relation Extraction Based On Wikipedia
10	The Research And Implementation Of Automatic Question Answering System Based On Wikipedia