Font Size: a A A

Research Of Chinese Full Text Retrieval Technology

Posted on:2004-11-18Degree:MasterType:Thesis
Country:ChinaCandidate:B YuFull Text:PDF
GTID:2168360122460689Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The full text retrieval (FTR) is the primal technology of disposing the information. The article does some research of the full text retrieval technology.1, The article summarize the development of the web search technology in the domestic country and aboard. It will refer to not only the common document retrieval in the web, but also the query of concept information, hypertext information, multimedia information and the data mining. These new technology are also introduced briefly. The article lists the specification of the full text retrieval technology, at the same time the deficiencies are also referred and the trends of the future are demonstrated.2, The paper demonstrates the two index methods of the FTR. Search based on the words list is very simple in the implementation of the algorithm without dividing the words and it is used widely. Because of considerable storage space and larger index database, higher rate in the full searching and the lower rate in the exact searching, the article demonstrates a new retrieval method based on the phrase list.3, Chinese Words Divided Syncopation Technology is the difficulty of the query technique based on phrase. Some divided syncopation such as mechanical matching method, feature phrase library method, restriction matrix method, syntax analysis method and comprehended syncopation method are emphasized. The MM method is easy to realize and the foundation of other methods, and is introduced emphatically.4, The article purpose the hybrid modeling based on character, word and phrase as the Chinese FTR using MM method. To reduce de divergent divided syncopation an improved MM method is prompted.5, The retrieval system adopting the algorithm could search for World wide web pages in school. The search engines could be classified front searching engines and meta searching engines: the meta one get Web document, then slice the word,establish and update index; the front one extract the content of the index library, provide the users query service. It uses network spider to scanning all HTML documents and find out the pages which is useful. Then it uses the idea of Vector Space Model (VSM) to pick up the result.
Keywords/Search Tags:Full Text Retrieval, Inverted Files, Divided Syncopation, Search Engines, Vector Space Mod
PDF Full Text Request
Related items