Design And Implementation Of Domain-based Document Retrieval System

Posted on:2019-10-04

Degree:Master

Type:Thesis

Country:China

Candidate:S Q Liu

Full Text:PDF

GTID:2428330572953842

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the development of information retrieval technology on the Internet,people are eager to obtain content that is highly meeting with their own needs and interests from massive semi-structured and unstructured data.How to classify these texts effectively and find valuable information is a topic that researchers in various fields are constantly exploring.In this paper,through the research of various retrieval methods in recent years,the advantages and disadvantages are summarized.We transmit a fresh method to construct domain knowledge base by utilizing multiple eigenvalues to compute together,and use deep semantic word vectors to construct text to measure text similarity.Specific research content: This paper Studies of a training model based on semantics,feature extraction algorithm and word bag model,and chooses the best retrieval method and model.By selecting data,optimizing model parameters and training many times,a high-quality semantic representation model of words is obtained,and multi-feature extraction algorithm is used to calculate multiple feature attributes of text,and on this basis,the establishment of domain knowledge base is realized.The related algorithms of text similarity are studied and implemented.The system uses continuous word bag model to calculate text similarity for retrieval.Based on this point,a domain-based document retrieval system is constructed and further used in practice.By comparing the experimental data,Text analysis based on semantics reflects more comprehensive text information,and the retrieval results of the system can be greatly improved by combining multi-feature extraction algorithm.According to the existing research results and the needs of text field researchers,a document retrieval system that domain information workers can manage and query data is designed,which can create independent data databases for different users.The system effectively isolates the databases,uses the functions of privilege management system,and provides support for the management update and retrieval of domain knowledge base.This system is developed with open source SSM framework technology.Users can manage personal databases and update them in real time.The system provides uploading,downloading,viewing,multi-feature extraction,document retrieval and other functions at present.The innovation of the system lies in the use of multi-feature calculation instead of single-feature calculation,which makes the text representation more comprehensive and accurate,this kind of method solves the problem of inaccurate calculation caused by incomplete information or too complex text in the process of text calculation to a certain extent.By using deep semantic training model,the erroneous judgment results caused by language ambiguity and word order changes are basically eliminated.In the stage of text similarity research,the traditional way of text representation is changed,and the way of text computation is changed from the perspective of semantics,which improves the accuracy and efficiency of text similarity calculation to a certain extent.

Keywords/Search Tags:

Document retrieval, Multi-eigenvalue computation, Domain knowledge base, Word meaning transformation, Text Similarity Discrimination

PDF Full Text Request

Related items

1	Research On Semantic Similarity Computation And Applications
2	Multi-document Retrieval System Design And Development
3	Research Of Word Semantic Similarity Based On Domain Knowledge
4	Automatic Construction Method For Domain Concepts Based On Wikipedia Semantic Knowledge Base
5	The Research Of Knowledge Acquisition Algorithm And Emantics Computation For Chinese Vocabulary And It's Applications
6	Design And Implementation Of Enterprise Knowledge Document Retrieval Management System
7	Research On Word Similarity Computation Method Based On Non-IID Learning
8	Construction Of Knowledge Base Disambiguation Knowledge Base Based On Multi - Knowledge Source
9	The Research Of Enterprise Document Retrieval Model Based On Ontology
10	The Design And Implementation Of Knowledge Search Engine In Technology Base On Lucene