Font Size: a A A

Spatial Relation Extraction Based On Multi-label Classification

Posted on:2014-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:L A XiangFull Text:PDF
GTID:2298330467464503Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of the technology of internet and geographic information, the Geographic Information System (GIS) plays an increasingly important role in public life. As the most common carrier for natural language, text is a kind of main source of original spatial data. Therefore, extracting spatial information from natural language texts is a critical problem that needs to be solved.Chinese spatial relation extraction is a special entity relation extraction. And the main difference between them is that a spatial relation instance may belong to several predefined labels of spatial relation simultaneously, which, means that spatial relation extraction is not a traditional single-label classification, but a multi-label classification. In this paper, the exploratory and experimental research on the spatial relation extraction is conducted. The main contributions of this thesis are summarized as follows:1. We implement a baseline system for spatial relation recognition based on the combination of the feature-vector method and the problem transformation method. Specifically, we transform the multi-label instances into more single-label instances and then every instance is converted as a feature vector. KNN and SVM are respectively used to learn classifiers for each label. For a new instance, the predicted label set can be obtained by combining the results from all classifiers. The experiment is performed on the data set consisting of2799spatial instances, and the precision achieves78.68%.2. The spatial relation recognition based on the combination of the kernel-based method and the algorithm adaptation method is realized. Concretely, the ML-KNN algorithm as multi-label classification approach is directly used on the multi-label instances. And the kernels including extended subsequence kernel, convolution tree kernel, improved convolution tree kernel and composite kernel are utilized to measure the similarity between instances. The experimental results on the same data show that the new composite kernel composed of extended subsequence kernel and improved convolution tree kernel achieves the best result, achieving the precision of79.16%, and has better performance than the baseline system.3. Finally the integration of spatial relation detection and recognition is conducted. The task of spatial relation extraction is to find the instances having spatial relation between entities and to determine the specific label set. This task includes two sub-tasks:the spatial relation detection and recognition. In order to accomplish spatial relation extraction completely, the methods including step-by-step integration method and combination integration method are explored. Experiments with those two methods are done on the same data which includes8908negative instances and2799positive instances. And the first method achieves better results, achieving the precision of63.48%.
Keywords/Search Tags:spatial relation extraction, multi-label classification, feature vector, ML-KNN, extended subsequence kernel, convolution tree kernel, composite kernel
PDF Full Text Request
Related items