Font Size: a A A

Research On Full-featured Text Search In Natural Language Understanding

Posted on:2014-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:C P HuangFull Text:PDF
GTID:2268330425961359Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of the network technology, the amount of information that exists in the network is growing, too. More and more people focus on how to obtain information from the huge sea of information efficiently, quickly and accurately that meet people’s requirement. The traditional search engine only is fixed on keyword information matching, and now more and more people have begun to tend to combine natural language with the search engine technology, what is called the intelligent search engine. In this paper, we introduce and analyze the Full-text Retrieval technology which is popular in area of the search engine, the Full-text Retrieval technology for the unstructured text content pay attention to the all content of a text, Through the text processing we can get the plain text information which can be indexed, then do the Chinese word segmentation and create index for the segmental words which is to make the indexing library and text message. When there is people searching for information, the search engine conducts the key words that the word tapped in the text box and does matching in the indexed database with the processed words, then gets the information that meet the user’s requirement from the index database. Based on the Full-text Retrieval search technology, we do research by adding the natural understanding language processing level which is Chinese word segmentation. The following content is the specific research content and the achievement:First, In this paper, we analyze and do research in the key principle of the Full-text Retrieval and the natural understanding language in the way of the basic theory, combined with the SS(Struts+Spring)framework we make a Full-text Retrieval prototype system that is based on natural understanding language what is the Chinese Omni-segmentation, the prototype system is aiming at the all content of a typical unstructured format document and do text pretreatment for it, Chinese word segmentation, making indexed database, doing information retrieval in the indexed database;Second, in the case when there is only smaller document information in the document database the developed prototype system works in a relatively high efficiency. But can be expected, when the document database contains a very large amount of information, the time and space must also be at quite large costs for doing text pretreatment, Chinese word segmentation and making indexed database. Aiming at this defect, in this paper we propose a thought that is only making indexed database for part of the content in the document, and based on the developed prototype system we make a further research and compare the two different types of document processing mechanism, through the test, we make a conclusion that make indexed database for part of the content in the document is valuable to research in the field of the information retrieval technology.
Keywords/Search Tags:Natural understanding language, Inverted Index, Full-text Retrieval, Chineseword segmentation, Local Index
PDF Full Text Request
Related items