| As the national famous tourist city,Xiamen greets a lot of tourists every year.For Large tourist groups,it is important to get Xiamen travel information conveniently and efficiently.Usually traditional searching engine is used by people to search information,which has some deficiencies,for example,only documents related to keywords can be returned.Compared to traditional search engines,automatic question answering system can offer brief and explicit answers for questions.Moreover,many real-time and regional information aiming at different places has been produced by the WeChat official accounts which are becoming more and more popular.In this paper,an automatic question answering system to satisfy the information requirements of Xiamen tourists has been built by mining articles data from WeChat official accounts,which combing three modules of question answering system:question analysis,information retrieval and answer extraction,the details are as follow:1.Collection of data related to Xiamen tourism.Firstly,105 WeChat official accounts related to Xiamen tourism has been obtained by artificial collection which were used by WeChat article collector to get article links.Then based on these article links,the contents of webpage have been obtained by crawler technology and saved in database which act as data sources of question answering system;At last,by crawler technology,question data sets related to Xiamen tourism have been acquired from the Internet.2.Analysis of Xiamen tourist questions.Firstly,question data sets have been labeled based on question classification system and then classified by Support Vector Machine.Secondly,keywords extracted from question which obtained from users were preprocessed and then extended by synonym word Lin.These keywords return as the input of information retrieval module.3.Retrieval of WeChat articles.Firstly,based on keywords,In order to obtain WeChat articles from database,we use open source search engine toolkit Lucene.Secondly,word segmentation tool in Lucene has been replaced by Chinese word segmentation system NLPIR to achieve better word segmentation of Chinese text.At Last,we get top five articles most related to keywords by Lucene.4.Answers extraction.Firstly,five articles mentioned above have been split into sentences sets which act as candidate answer sets.Secondly,answers extraction rules have been created by question category which can be used to filter the candidate answer sets.At last,the similarity of sentences has been calculated to find out top five sentences most related to question and then return them as the final answer. |