Font Size: a A A

Research Of Database Full-Text Retrieval Based On Related Words Recognition

Posted on:2015-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:P Z GaoFull Text:PDF
GTID:2268330431454545Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the popularity of computers, The process of industries Information is accelerating, various information systems are emerging. As the data source of application systems, with the use of these information systems and all kinds of data is more and more, database becomes very large. How to quickly find the information we need to promote the research of database full-text search.In response to these needs, this paper design and implement database full-text search frame, By comparing the advantages and disadvantages of the two full-text retrieval method, we choose lucene index, It implement the database fulltext retrieval based on the open source search engine to index and search index. And maintain the index by incremental index.The index search by lucene is fixed search based on keywords. But because of we can’t descript things by one words in our life, this way can’t get a accurate result, or the result is not comprehensive. Taking this problem into account, the retrieval system should be recognize the users’ intention and search all result that users want to. This is made more strict requirements for Chinese synonym recognition technology. In view of the above problems, the paper research a large number of synonyms recognition algorithms, based on large amounts of data already in the database to improve the synonym recognition algorithms applied to large data extract relevant words and build the related words thesaurus in this system. On the basis of the related words thesaurus, adding the identification of related words into database full-text retrieval frame, and after adding related words for search results sorting problem lucene own sorting algorithm has been improved, and the related words Image area on the degree of importance of the results separately. The experimental results show that this method extends the search results, recall improves retrieval system. Current research on the Chinese domestic synonym recognition is just beginning, the application in the search engines cannot be satisfactory. In this paper, the application of university information management system as the background to identify subject-related words for the purpose of ideological reference synonym recognition algorithms, combined with full-text search lucene build a data base model that supports the identification of related words, synonyms identification method is applied to improve related words thesaurus construction, from the disciplines based on the concept of similarity between the concept of semantic similarity algorithm to calculate a tree, according to the size of the similarity to determine related words, on the other hand, from the existing paper based on statistical data related ideas to extract relevant words. Through the relevant word thesaurus structure, the identification of related words used in full-text searches. On the basis of the vector space model based on the results of the impact of the size of words used to describe the concept of relevance, design a reasonable result sorting algorithm.
Keywords/Search Tags:recognition of related words, Database fulltext retrieval, lucene, relatedwords search
PDF Full Text Request
Related items