Font Size: a A A

A Research On Chinese Word Segmention Based On The Combination Of Dictionary And Statistics And Full-Text Retrieval System Design

Posted on:2018-08-16Degree:MasterType:Thesis
Country:ChinaCandidate:S Y ZhouFull Text:PDF
GTID:2348330518976230Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the information age , More and more information has created, It is becoming increasingly important to find out what information you want from a lot of information . Such as the Wudang Mountain resource library in the Jingchu resource library, the total resources have several hundred G, the number of various types of documents are thousands, to find specific information from a large number of documents becomes very difficult .Information retrieval technology is used to solve this problem . Full-text search is an increasingly important role for information retrieval,Many large search engines have adopted full-text search technology.Chinese word segmentation is the first step in Chinese information processing,Whether it is natural language processing or full-text search are inseparable from the extraction of Chinese information and Information extraction must involve word segmentation. In chinese there is no space between the word and the word as a word separator and the Chinese semantic context is more complicated, cause Chinese word segmentation is always a difficult point. For the Chinese word segmentation people put forward a variety of methods to word segmentation, such as dictionary segmentation,statistical word segmentation, word segmentation based on understanding of the meaning and so on.This paper analyzes the principle of full - text retrieval technology and discusses the open source full - text retrieval framework Lucene, and full-text research must involved in text segmentation to extract information. This paper discusses the principles of Chinese word segmentation, discusses the advantages and disadvantages of various word segmentation methods, and puts forward a method of word segmentation based on dictionary and statistics.The work done in this paper is as follows:1.we analyzes the research background and status of full-text retrieval and Chinese word segmentation, analyzes and describes the more commonly used full-text search and Chinese word segmentation technology.2.This paper analyzes the commonly used word segmentation technology and puts forward a word segmentation method based on the combination of dictionary and statistics on the basis of comparing the advantages and disadvantages of various methods. The method utilizes the advantages of word segmentation based on dictionary and ambiguous judgments based on statistics, in order to achieve the purpose of improving the accuracy of word segmentation.3.Using Lucene framework combined with custom analyzer to design the full text retrieval system.
Keywords/Search Tags:full text retrieval, lucene, chinese word segmentation, HMM Model, Resource Library
PDF Full Text Request
Related items