Font Size: a A A

Information Fusion Research Of Web Travel Information Integration

Posted on:2014-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:L YangFull Text:PDF
GTID:2268330398492122Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Since stepping into digital era, development of our traditional tourism keep pace with the times, lots of travel sites have sprung up all over the Web. However, there are so many sites that result in divergent opinions, and at the same time easy to appear data inconsistency phenomenon; moreover, the needs of users are varied, but the data of one website is often limited, so demands can not be met. Therefore, in order to satisfy the requirements of more users, and provide users with search platform which can show comprehensive tourist information and be applied in existing travel sites or travel information terminals, etc., this thesis integrated travel information on the Web, aiming to build an integrated system and provide more comprehensive travel information.First, the thesis utilized crawler to get variety of texts from Baidu encyclopedia, China Travel and so on, and then accomplished data cleaning such as removing tags and network format characters in texts. Second, according to text classification technology, scenic spots and its related texts were chose from Baidu encyclopedia. Third, in order to identify non-conformance phenomenon existed in scenic spots data, entity recognition technology was used to solve problems about having same name but in fact they were different places and same spot with different names, then make spots data more complete and unified. At last, due to the presence of multiple texts describing the same attractions while many of their contents were same, and for purpose of presenting a complete and readable text in front of users, similarity paragraphs were deleted and low similarity passages were put into the main text on the base of using text similarity measure, then formed introduction with more information.So, this thesis made the following contributions:(1) Combined with characteristics of Web crawling texts, the thesis used the weight of feature words to express text, and proposed a new text classification method based on the weight of feature words.(2) Direct at entity identification about scenic spots, this thesis placed attractions information from different data sources into different collections and made use of cross validation to improve precision of identification.(3) the thesis also put forward a kind of text similarity measure method which served for text fusion and based on the noun in paragraphs, in order to compute text similarity between paragraphs and between paragraphs and text, and then merge different text last.
Keywords/Search Tags:Web travel information integration, text classification, entity identification, Text similarity measure
PDF Full Text Request
Related items