Sentence Similarity Computing Combining Multi-features Based On HowNet

Posted on:2010-02-07

Degree:Master

Type:Thesis

Country:China

Candidate:H Q Zhu

Full Text:PDF

GTID:2178360275481999

Subject:Computer Science and Technology

Abstract/Summary:

With the rapid development of internet,there are more and more information online.How to get the infomation we need quickly and accurately has become increasingly difficult.While traditional search engines(such as Google) have achieved great success,these search engines can only return websites which are relevant to user inquiries.Users must have their own search for revevant infomation from these websites.The inquiries are a series of keywords,rather than natural language fromat.In fact,users may be more accustomed to using natural language to describe a problem.In most cases,users need the exact answer to the problem,rather than a series of ralated websites.QA technology is a very hot research direction in the field of natural language processing.It combines a large variety of NLP technologies.In this paper,we try to investigate some technologies for Chinese QA systems.In the field of information retrieval, semantic similarity plays an important role, to improve the accuracy of semantic similarity has important theoretical and practical significance.It's more difficult for computers to process Chinese than to Western letters in the processing of word segmentation.Word segmentation is the foundation and precondition of Chinese sentense similarity computing, the accuracy of the result can be greatly improved when adopting more efficient arithmetic.In this paper,a kind of improved Chinese word segmentation method based on double-array trie and the strategy to eliminate the ambiguity is put forward on the analysis and contrast of common Chinese word segmentation arithmetic. We propose an imporoved method about dictionary mechanisms, segmentation steps and ambiguity, the integrality and accuracy of word segmentation will be enhanced.In Chinese information processing, sentence similarity computing is widely used in the area of information retrieval,machine translation, automatic question-answering, text mining and etc.It's a question of much essential and important that people study as a hotspot and difficulty for a long time.In this paper, on the basis of analysis and contrast of existing sentense similarity computing methods, a new sentense similarity computing method is put forward which make use of TF-IDF method based on VSM and semantic method based on Hownet combind with the word segmentation arithmetic which mentioned above.The realization of Chinese text word segmentation and similarity computing with computer system is put forward and carry through many testing. Question-answering retrieval syatem is tested as example to validate the method that used. Experimental results show that after making use of the presented word segmentation algorithm, the time and space utilization efficiency has been greatly improved and proposed methods of sentence similarity effect is good.

Keywords/Search Tags:

Natural language processing, Question answering, Hownet, Word segmentation, Semantic similarity

Related items

1	An Algorithm For Optimizing Word Similarity In "Knowledge Network"
2	Research On Chinese Vedio Question Answering
3	Research On Chinese Frequent Question-Answering System Based On Semantic Comprehension
4	Design And Implementation Of A Natural Language Question-Answering System
5	The Research Of Semantic Similarity Computing Algorithm Based On HowNet
6	The Research Of HowNet Based Word Similarity Computation And Its Application
7	Chinese Auto Question-Answering System For Computer Domain
8	Research On Question Similarity In Question Answering System
9	Research On Course Answering System Based On Artificial Intelligence
10	Research On Visual Question Answering Method Based On Scene Word Analysis