Font Size: a A A

Design And Implementation Of Full-text Retrieval System Based On Lucene

Posted on:2014-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:L X ZhangFull Text:PDF
GTID:2268330422463503Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Cloud AV intelligent terminal is a integration product of telephone network, TV network and Internet, which combines personal media content and the Internet. It makes all the media content can be watched through the HD panel TV. There are vast amounts of audio and video resources in Cloud AV Intelligent Terminal. Users need efficient searching tools to find information they focus on quickly and precisely. Even though we have mature full-text search engines like Baidu, the intelligent terminal as a separate system can’t use them directly. So, we need to design an efficient full-text retrieval system.Lucene is a mature and efficient tool in the open source frameworks of full-text retrieval system. It is a toolkit of full-text search engine developed by Java. We can process secondary development with the help of the interfaces provided by Lucene in order to complete full-text retrieval systems of various specific purposes. In this system, the object of text analysis is mainly focus on Chinese, so the Chinese word segmentation technology becomes a key point. Although Lucene supports Chinese word segmentation, the method is too simple and mechanical. And Lucene has the default sorting algorithm, but the order is always not agree with the facts. So, using Lucene directly cannot meet the actual needs, we can take advantage of Lucene only after extending and improving it.We will design and implement a full-text retrieval system based on Lucene to search the audio and video data in the intelligent terminal. We come up with GMM algorithm, GMM looks for the longest word in the global range using the principle of the global maximum matching for the purpose of improving word segmentation accuracy. Meanwhile, the sorting algorithm of the return results has been improved through establishing a new formula which considering various factors like times and positions the key word appears.The results show that the improved GMM algorithm has gotten better word segmentation effect, the sorting of the returns is more in line with user’s requirements. At the same time, the recall ratio and the precision ratio of the full-text retrieval system maintain at a high level.And the time of waiting for query results is within acceptable limits. All the above proves that the system meets the actual demand.
Keywords/Search Tags:Full-text Retrieval, Index, Intelligent Terminal, Word Segmentation, Sort
PDF Full Text Request
Related items