Research On Search Method Of Natural Language Understanding Based On Phrase Identification

Posted on:2008-08-08

Degree:Master

Type:Thesis

Country:China

Candidate:B Qi

Full Text:PDF

GTID:2178360242471424

Subject:Computer software and theory

Abstract/Summary:

With the rapid development of Internet and the enlargement of network application, it will be more and more difficult for people to find information from Internet accurately and quickly. Nowadays, the percentage of recall and precision of search engine has been low. Taking Google for example, its index amount has reached 3.3 billion, but it also makes user's query request into keywords to compare with each word in the document without considering the semantic matching between query words and document. There are also some search engines similarly, such as Baidu, Yahoo, etc. Their search methods are both based on the technology of word frequency analysis. The returning information is a lot, but much it is so irrespective that users have to filter it again.Contraposing the shortage of traditional search engines, this thesis has researched a new generation information retrieval systemâ€”â€”the search engine beaded on natural language understanding. This is a researching hotspot in natural language processing domain, and at the same time it stands for the future development direction of search engines. This kind of search engines uses many technologies synthetically, such as knowledge representation, information retrieval, natural language processing, etc. They allow users to input questions with natural language, without using the compounding of keywords. It makes user's operation easier.This thesis has researched some correlative technologies of natural language processing in the search engine domain, and it contains as follows:â‘ Chinese word segmentation technology, which analyzes the development of word segmentation technology at home and abroad and enumerates some typical word segmentation algorithms.â‘¡Machine recognition of modern Chinese phrases, which uses"priority merger algorithm"to decompose a complex phrase into hierarchy.â‘¢The syntax analysis of verbal predicate sentence, which defines a method called"predicate links"to decompose natural language sentence, realizes respective disposal with each part, and at last forms a phrase structure tree.â‘£Concept distilling and extended retrieval technology, which distills concepts form phrase structure tree and according to the relationship of concepts in the tree endues them with different weight. Then present extended retrieval for their corresponding English words. â‘¤Clustering browse technology, which makes user's searching results not a single information list, but a new display way having catalogs and hierarchy.The main contribution of this thesis is realizing the basic module of search engine based on natural language understanding. It has validated recall and precision of the system, and has an engineering practicality. The work and its result also have a certain reference value and guidance sense about correlative theory researching and analysis and realization of actual system.

Keywords/Search Tags:

Natural Language Understanding, Search Engine, Phrase Identification, Clustering Browse

Related items

1	Research On Search Engine Oriented Natural Language Processing Technology
2	With Natural Language Understanding And Information Mining Capabilities, Search Engine Development
3	Studies On The Usage Of Preposition And Conjunction In Phrase Structure Syntactic Parsing
4	Research On Question Answering System Based On Understanding Of Chinese Natural Language
5	The Design Of The Model Of Natural Language Processing And Intelligent Search Engine
6	Automatic Identification Of Chinese Prepositional Phrase Based On CRF
7	Research Of Chinese Natural Language Understanding In Search Engine
8	Research On Chinese Phrase Structure Ambiguities Based On Semantic Analysis And Its Implementation
9	The Search Engine Based On Chinese Natural Language Processing
10	Research On Natural Language Understanding Algorithm For Cloud-Based Service Robots