Font Size: a A A

Research Of Document Retrieval Based On Semantic Analysis

Posted on:2019-12-02Degree:MasterType:Thesis
Country:ChinaCandidate:R S ZhangFull Text:PDF
GTID:2428330548959290Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,due to the exponential growth of the number of documentary information resources,and the constant updating of the information resources at all times,how to accurately obtain and utilize documentary resources has become a hot topic in current technology research.With the progress of the times,the search system has evolved from the earliest manual information retrieval to the current computer information retrieval.The major foreign literature retrieval tools are SCI(Science Citation Index),EI(Engineeri-ng Index),ISTP(Index to Scientific &Technical Proceedings),domestic is Wanfang,How Net,China Journal and so on.At present,most search systems do not logically match the input query content with documents,and cannot accurately extract the documents that users really need.Simply indexing the text rather than the true meaning of the text,the retrieval rate and efficiency of the retrieval system must not reach the true needs of the user.Therefore,this article studies the above issues.The search of keywords plays an important role in the accuracy of the literature search,so the keyword extraction technology is optimized.Among them,the KEA algorithm proposed by Eiber-Frank et al.can extract keywords based on multiple features.The algorithm uses naive Bayes machine learning methods to extract the keywords in the document,but this method is used to extract keywords for English documents.Fang Jun,Guo Lei and others improved this method to make it suitable for keyword extraction in Chinese literature.This article improves on the improved KEA method to make the keyword extraction more accurate.At present,the extraction of keywords is mainly divided into two categories based on word frequency and semantics.Semantic-based keyword extraction method can semantically analyze the words in the literature and obtain the deep meaning between words,thereby improving the accuracy of keyword extraction.In this paper,the semantic analysis is more applied to the improved KEA algorithm.On theselection of the feature of this algorithm,the original TF_IDF is changed to TF_IWF,which reduces the influence of the literature in the same field on keyword extraction and replaces First Occurrence with Text Rank.,making the extraction of keywords more reliable.It also improves word segmentation and candidate word merging in the literature to reduce the redundancy of candidate keywords and greatly improve the accuracy of the results.In order to verify the feasibility and practicability of the improved kea algorithm,the improved kea algorithm is applied to the example of document extraction and sorting,and the sorted text is viewed.the user needs the text in the front row,which proves the practicality of the method.At the same time,compared with the existing semantic analysis methods in accuracy,recall rate and the harmonic mean of the two,the improved algorithm is more accurate because the feature selection of naive Bayes method is more important than the semantic analysis method,so the query results are more accurate.
Keywords/Search Tags:Semantic analysis, keyword extraction, document retrieval, machine learning
PDF Full Text Request
Related items