Font Size: a A A

Research On Search Method Of Natural Language Understanding Based On Phrase Identification

Posted on:2008-08-08Degree:MasterType:Thesis
Country:ChinaCandidate:B QiFull Text:PDF
GTID:2178360242471424Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and the enlargement of network application, it will be more and more difficult for people to find information from Internet accurately and quickly. Nowadays, the percentage of recall and precision of search engine has been low. Taking Google for example, its index amount has reached 3.3 billion, but it also makes user's query request into keywords to compare with each word in the document without considering the semantic matching between query words and document. There are also some search engines similarly, such as Baidu, Yahoo, etc. Their search methods are both based on the technology of word frequency analysis. The returning information is a lot, but much it is so irrespective that users have to filter it again.Contraposing the shortage of traditional search engines, this thesis has researched a new generation information retrieval system——the search engine beaded on natural language understanding. This is a researching hotspot in natural language processing domain, and at the same time it stands for the future development direction of search engines. This kind of search engines uses many technologies synthetically, such as knowledge representation, information retrieval, natural language processing, etc. They allow users to input questions with natural language, without using the compounding of keywords. It makes user's operation easier.This thesis has researched some correlative technologies of natural language processing in the search engine domain, and it contains as follows:①Chinese word segmentation technology, which analyzes the development of word segmentation technology at home and abroad and enumerates some typical word segmentation algorithms.②Machine recognition of modern Chinese phrases, which uses"priority merger algorithm"to decompose a complex phrase into hierarchy.③The syntax analysis of verbal predicate sentence, which defines a method called"predicate links"to decompose natural language sentence, realizes respective disposal with each part, and at last forms a phrase structure tree.④Concept distilling and extended retrieval technology, which distills concepts form phrase structure tree and according to the relationship of concepts in the tree endues them with different weight. Then present extended retrieval for their corresponding English words. ⑤Clustering browse technology, which makes user's searching results not a single information list, but a new display way having catalogs and hierarchy.The main contribution of this thesis is realizing the basic module of search engine based on natural language understanding. It has validated recall and precision of the system, and has an engineering practicality. The work and its result also have a certain reference value and guidance sense about correlative theory researching and analysis and realization of actual system.
Keywords/Search Tags:Natural Language Understanding, Search Engine, Phrase Identification, Clustering Browse
PDF Full Text Request
Related items