Font Size: a A A

Sentence Similarity Computing Combining Multi-features Based On HowNet

Posted on:2010-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:H Q ZhuFull Text:PDF
GTID:2178360275481999Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of internet,there are more and more information online.How to get the infomation we need quickly and accurately has become increasingly difficult.While traditional search engines(such as Google) have achieved great success,these search engines can only return websites which are relevant to user inquiries.Users must have their own search for revevant infomation from these websites.The inquiries are a series of keywords,rather than natural language fromat.In fact,users may be more accustomed to using natural language to describe a problem.In most cases,users need the exact answer to the problem,rather than a series of ralated websites.QA technology is a very hot research direction in the field of natural language processing.It combines a large variety of NLP technologies.In this paper,we try to investigate some technologies for Chinese QA systems.In the field of information retrieval, semantic similarity plays an important role, to improve the accuracy of semantic similarity has important theoretical and practical significance.It's more difficult for computers to process Chinese than to Western letters in the processing of word segmentation.Word segmentation is the foundation and precondition of Chinese sentense similarity computing, the accuracy of the result can be greatly improved when adopting more efficient arithmetic.In this paper,a kind of improved Chinese word segmentation method based on double-array trie and the strategy to eliminate the ambiguity is put forward on the analysis and contrast of common Chinese word segmentation arithmetic. We propose an imporoved method about dictionary mechanisms, segmentation steps and ambiguity, the integrality and accuracy of word segmentation will be enhanced.In Chinese information processing, sentence similarity computing is widely used in the area of information retrieval,machine translation, automatic question-answering, text mining and etc.It's a question of much essential and important that people study as a hotspot and difficulty for a long time.In this paper, on the basis of analysis and contrast of existing sentense similarity computing methods, a new sentense similarity computing method is put forward which make use of TF-IDF method based on VSM and semantic method based on Hownet combind with the word segmentation arithmetic which mentioned above.The realization of Chinese text word segmentation and similarity computing with computer system is put forward and carry through many testing. Question-answering retrieval syatem is tested as example to validate the method that used. Experimental results show that after making use of the presented word segmentation algorithm, the time and space utilization efficiency has been greatly improved and proposed methods of sentence similarity effect is good.
Keywords/Search Tags:Natural language processing, Question answering, Hownet, Word segmentation, Semantic similarity
PDF Full Text Request
Related items