Research Of Document Retrieval Based On Semantic Analysis

Posted on:2019-12-02

Degree:Master

Type:Thesis

Country:China

Candidate:R S Zhang

Full Text:PDF

GTID:2428330548959290

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,due to the exponential growth of the number of documentary information resources,and the constant updating of the information resources at all times,how to accurately obtain and utilize documentary resources has become a hot topic in current technology research.With the progress of the times,the search system has evolved from the earliest manual information retrieval to the current computer information retrieval.The major foreign literature retrieval tools are SCI(Science Citation Index),EI(Engineeri-ng Index),ISTP(Index to Scientific &Technical Proceedings),domestic is Wanfang,How Net,China Journal and so on.At present,most search systems do not logically match the input query content with documents,and cannot accurately extract the documents that users really need.Simply indexing the text rather than the true meaning of the text,the retrieval rate and efficiency of the retrieval system must not reach the true needs of the user.Therefore,this article studies the above issues.The search of keywords plays an important role in the accuracy of the literature search,so the keyword extraction technology is optimized.Among them,the KEA algorithm proposed by Eiber-Frank et al.can extract keywords based on multiple features.The algorithm uses naive Bayes machine learning methods to extract the keywords in the document,but this method is used to extract keywords for English documents.Fang Jun,Guo Lei and others improved this method to make it suitable for keyword extraction in Chinese literature.This article improves on the improved KEA method to make the keyword extraction more accurate.At present,the extraction of keywords is mainly divided into two categories based on word frequency and semantics.Semantic-based keyword extraction method can semantically analyze the words in the literature and obtain the deep meaning between words,thereby improving the accuracy of keyword extraction.In this paper,the semantic analysis is more applied to the improved KEA algorithm.On theselection of the feature of this algorithm,the original TF_IDF is changed to TF_IWF,which reduces the influence of the literature in the same field on keyword extraction and replaces First Occurrence with Text Rank.,making the extraction of keywords more reliable.It also improves word segmentation and candidate word merging in the literature to reduce the redundancy of candidate keywords and greatly improve the accuracy of the results.In order to verify the feasibility and practicability of the improved kea algorithm,the improved kea algorithm is applied to the example of document extraction and sorting,and the sorted text is viewed.the user needs the text in the front row,which proves the practicality of the method.At the same time,compared with the existing semantic analysis methods in accuracy,recall rate and the harmonic mean of the two,the improved algorithm is more accurate because the feature selection of naive Bayes method is more important than the semantic analysis method,so the query results are more accurate.

Keywords/Search Tags:

Semantic analysis, keyword extraction, document retrieval, machine learning

PDF Full Text Request

Related items

1	The Analysis And Design Of Financial Infomation Intelligent Search System For IPO Document
2	Research On Keyword Extraction And Structured List Data Extraction
3	Research On Chinese Semantic Keyword Extraction Method Based On Multiple Features
4	Research On Semantic Based Document Keyword Extraction Technology
5	The Research Of Keyword Extraction Technology In Multi-Document
6	The Research On The Extraction And Retrieval Of Keyword In Network Video Subtitles
7	Keyword Extraction From News Web Pages
8	Research On Keyword Extraction And Sentiment Analysis For Chinese Text
9	Content Based Image Retrieval And Image Semantics Analysis
10	Research On Vision-Based Contextual Advertisement Keyword Retrieval Algorithm