Font Size: a A A

The Field Specific Automatic Question Answering System Based On LDA Model

Posted on:2014-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:J L LiuFull Text:PDF
GTID:2248330398479219Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of the Internet, more and more information has been contained. It has been the generally hope of the people to quickly find the information they want on the Internet. At the same time, the effect of the search engine is not good enough, and there are a lot of inadequacies in the search engine, limiting the efficiency of the access to information. Automatic question answering system can be smarter, faster and more accurate in accessing to what the user wants to query. In recent years, it has become a hot spot of extensive research by scholars at home and abroad.This paper, in order to achieve the goal of automatic question answering system for the solution of common computer failure, deeply discusses the whole procedure of automatic question answering system from question processing to answer retrieving. During the study, it is found that segmentation and semantic similarity calculation is the core of automated answering system. As opposed to the current system requirements and research status, there are many aspects for improvement. Two aspects of improvement are discussed, and proved effective by the experiments after each section. Finally, a prototype system of automatically giving solutions to user questions about computer failure is designed and implemented.First, the commonly used method in the field of Chinese word segmentation is discussed. The two classic methods including dictionary-based segmentation, statistic-based segmentation are deeply analyzed and other methods are briefly introduced. The effects of different methods are compared. Then, the method based on field dictionary and word string Mutual information segmentation is proposed. In this method, the semantic information is added, and the characteristics of the specialized vocabulary of the field are taken into account. Finaly, the word string mutual information is added to resolve ambiguities. The experimental results show that the field specific text segmentation performance is enhanced by these improvements.Secondly, the concept of semantic similarity and calculation principles are briefly discussed. And semantic similarity calculation method based on the edit distance, semantic similarity calculation method based on the dependency and similarity calculation method based on semantic distance and ontology are studied. Also, a new method to improve the the classic similarity calculation method is proposed, which using the LDA model to get a field related word-theme distribution by domain corpus training. As the relavence of the word in the same theme is considered, the calculated semantic similarity is more reliable.Finally, automatic question answering system for common computer failure is designed. Good design makes the system framework has the characteristics of high cohesion, low coupling, which can greatly reduce the cost of system upgrades and the maintenance. Also, demo version of the system is implemented on Windows XP platform, based on the NET Framework. By the actual test, the system is running well.
Keywords/Search Tags:word segmentation, LDA model, FAQ, similarity calculation, semanticinformation
PDF Full Text Request
Related items