Font Size: a A A

Research And Implementation Of The Tibetan Full Text Retrieval System Based On LUCENE

Posted on:2013-07-10Degree:MasterType:Thesis
Country:ChinaCandidate:S J B BaFull Text:PDF
GTID:2248330362463353Subject:Chinese Ethnic Language and Literature
Abstract/Summary:PDF Full Text Request
In recent years, through the implementation of national special projects,Tibetan information research and development have made great strides inthe field of development, from the standard into Tibetan language basedsoftware development, and other key sectors, it have achievedbreakthrough results and it is a prerequisite for further research anddevelopment. However, the development of Tibetan language informationprocessing technology is in its infancy. Tibetan applications such asfull-text retrieval system gaps have been highlighted. As an indispensabletool of accessing information in the information society, to research toachieve Tibetan language full-text retrieval system is the emphasis on thisarticle.Tibetan text retrieval system includes the traditional areas of the word,words, sentences, paragraphs, grammar of the article, and informationretrieval principle in the field of information processing, knowledge ofword segmentation, query methods, document relevance rankingalgorithm etc. At the same time, it is also necessary to solve theredundancy of Internet information, the quality varies greatly, range offormats, scattered locations, association with complex, and difficulties inthe needs of users’ expressions etc. LUCENE as a full text search tool ofopen source code package, through the specification of its frame,extended functions in order to achieve targeted system for full text searchfunction and to become a shortcut to resolve the aforementionedproblems.This paper is based on the theoretical research of full text search andLUCENE full text search system and gets the following results:First, it designs and implements Tibetan Word segmentation based onLUCENE. It at the same time supports binary segmentation of threelanguages-Tibetan, Chinese, or English.Second, It incorporates the characteristics of Tibetan sentences bycombining the main components of a sentence and auxiliary words toexpress the semantic relations, and advocates optimized strategiesachieved by this article. At the same time it advocates segmentation ofsplitting auxiliary word as well as the tightening method and the restoration of Tibetan words after splitting to improve the accuracy ofsegmentation.Third, by applying Tibetan language segmentation achieve by thisarticle, it is designed to achieve Tibetan-language text retrieval systembased on the LUCENE, while supporting the full-text search of threelanguages--Tibetan, Chinese and English.
Keywords/Search Tags:Tibetan full-text search, LUCENE, the parser/wordbreaker
PDF Full Text Request
Related items