Font Size: a A A

The Research And Implement Of Synonym Expanding Retrieval Based On Lucene

Posted on:2012-08-20Degree:MasterType:Thesis
Country:ChinaCandidate:S WangFull Text:PDF
GTID:2178330335975490Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
As information technology flourished developed and the internet widely used, search engine has become the necessity that people used to work, study and play everyday. Now most of search engines perform searching based on keywords, find out the matching records throughout the index and give them to user. However, people always describe the same concept by using different words because their life, knowledge and expertise, as well as regional language habits are not always the same. These words are synonymous, therefore users can't find out the answers that satisfied all of their need.Therefore the method of synonym expanding retrieval emerges as the times require. It improves the traditional retrieval mechanism. When system adds terms into the index it embeds synonyms into the same place of index, and sets the position increment as zero. So the term and its synonyms have same offset. The search engine can hit records directly through the retrieval process regardless of the keywords whether is the original terms or its synonyms. This method expands the searching range, solves the problem brought by fuzzy retrieval.Chinese word segment is the core of establishing index structure, and it impacts the accuracy and exactness degree. It's also the foundation that synonyms expanding retrieval developing with. According to need based on synonyms expanding retrieval, we design the modified forward maximum matching algorithm which based on triple hash dictionary. The three layers of dictionary store hash value of first word, word length and hash value of term separately, the suitable list stores all of terms that have the same hash value. Meanwhile, we add the bidirectional link-list into term Storage structure. It points next equation or synonym, and forms an orbicular linked structure, give different value of association degree. It stores Chinese and synonym in one dictionary so that saving the storage space. Since the feature of hash algorithm extremely reduces time complexity when searching term, it saves the searching time.We design and carry out synonyms expanding retrieval mechanism combining the demand of news retrieval on the basis of open source Lucene2.0. We use plenty of news materials to test system and experimental results shows that synonyms expanding retrieval extremely increases recall-precision degree without impacting time consuming, provides convenience for user.
Keywords/Search Tags:Triple hash dictionary, Forward maximum matching algorithm, Synonym expanding retrieval
PDF Full Text Request
Related items