Font Size: a A A

Entity Relation Extraction And Knowledge Link System Construction In Mining Web Tourism Culture

Posted on:2017-09-27Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhongFull Text:PDF
GTID:2348330488451282Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the increasing pressure of work and the development of society,tourism has become a way which people release stress and enjoy life.Now tourist not only seek natural beauty,but also look forward to a rich cultural experience,because of their higher cultural level.Web contains a large number of tourism cultural information,including the natural landscape presentation,celebrity anecdotes,poetry about attractions,movies,etc.In this case,we need information extraction technology to digging out the cultural information of interest to tourists from the messy,unstructured text.Information extraction includs three subtasks:named entity recognition,disambiguation and relation extraction.From the perspective of tourism cultural mining,this papaer extraction tourism entities and relationship between them from Web tourism information text.Then,we use extracted knowledge to build a network of relationships(Knowledge Atlas).There are two approachs for named entity recognition which includs rule-based method and Mechine learning menthod.Although Mechine learning method has a good statistical basis,but it requires a lot of manual tagging corpus,and features' good and bad play a direct rolet on performance.In principle,if we can extract the appropriate rules,rule-based approach is better.This paper's data set has a feature of single sentence,a relatively high degree of coupling content.Inaddition,the five categories entity this article focuses on contain a combination of entities and entity refers to the entity.But they have some certain rules,such as:only a modifier or adverbial verb can appears in the entity;entity header and trailer mostly nouns,and so on.Thus,this paper put forward a candidate entity extraction algorithm according to word,part of speech and dependency syntactic structture,after clause,segmentation,POS tagging,dependency parsing and semantic role labeling of text.The experimental results show that the entity recall has 96%.Then,this paper employ entity suffix word rules and machine learning method to classify entities and prun.finally the experiments show that the F value can be up to 91%,which able to identify and classify the main entities.By analyzing the sentences' characteristics of Web tourism text,this paper proposed the verb feature with nearest syntactic dependency,and its effectiveness is verified by experiment.After tour enty extracting and data processing,this paper empoly machine learning methods based on a feature vector to extract relationships between above-entities,using the word features,syntactic features and semantic features presented in previous studies.In order to explore and analyze the different impacts of different features in relation extraction,conduct fourteen experimental comparison groups.According to the experiments' results,identify the most suitable features for Web tourist information text.For a richer relation extraction results,the writer designs three time filling principle.Based on the principle,time tuple is added to the result of relation extraction.Finally,with the knowledge of text data,entities,their relationships and downloaded resorts information,develop knowledge link system of tourism culture which shows knowledge using character,table and graphical form.
Keywords/Search Tags:Information Extraction, Named Entity Recognition, Relation Extraction, Tourism Culture, Knowledge Link System
PDF Full Text Request
Related items