Font Size: a A A

Relation Extraction Of Chinese Named Entities Based On Location And Semantic Features

Posted on:2012-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:H G LiFull Text:PDF
GTID:2178330335961579Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Named entity relations are a foundation of semantic networks, ontology and the semantic Web, and are widely used in information retrieval and machine translation, as well as in automatic question and answering systems. In the named entity relation extraction, relational feature selection and extraction are two key issues. The location features possess excellent computability and operability, while the semantic features have strong intelligibility and reality. Currently, relation extraction of Chinese named entities mainly adopts the Vector Space Model, a traditional semantic computing or Support Vector Machines method, and these three methods use either the location features or the semantic features alone, resulting in unsatisfactory extraction.To improve the performance of relation extraction of Chinese named entities, an extraction method named LaSE (Location and Semantic Extraction) is proposed in this thesis to combine location and semantic features. The main contributions of this thesis are as follows:(1) To ensure a good performance with minimal human participation, this thesis replaces named entity tables with POS (part-of-speech) flexibility. On the one hand, this replacement greatly reduces the human participation; on the other hand, POS is a domain independent concept and will not introduce any domain knowledge.(2) The location features provide the information gain of the positions of words, while the semantic features can be used for calculating the semantic similarity based on HowNet. The LaSE method combines both location and semantic features during the extraction process to adapt to the Chinese applications. The experiments show that this combination (with an F-score of 0.879) performs better than using either the location features (0.766 on F-score) or the semantic features (with 0.597 on F-score).(3) The LaSE method only needs a handful of relation seeds as input, based on the input of relation seeds, and the relations are automatically extracted. This method can be transplanted from one domain to another without any modifications since it does not require any domain knowledge. It can also be extended to deal with massive data due to its linear time complexity and low space complexity. Therefore, the LaSE method is semi-supervised, domain independent and scalable.
Keywords/Search Tags:Named entity relation extraction, Semantic computing, Information gain, Semi-supervised learning, Domain independence, Scalability
PDF Full Text Request
Related items