Font Size: a A A

Design And Implementation Of Domain-based Document Retrieval System

Posted on:2019-10-04Degree:MasterType:Thesis
Country:ChinaCandidate:S Q LiuFull Text:PDF
GTID:2428330572953842Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of information retrieval technology on the Internet,people are eager to obtain content that is highly meeting with their own needs and interests from massive semi-structured and unstructured data.How to classify these texts effectively and find valuable information is a topic that researchers in various fields are constantly exploring.In this paper,through the research of various retrieval methods in recent years,the advantages and disadvantages are summarized.We transmit a fresh method to construct domain knowledge base by utilizing multiple eigenvalues to compute together,and use deep semantic word vectors to construct text to measure text similarity.Specific research content: This paper Studies of a training model based on semantics,feature extraction algorithm and word bag model,and chooses the best retrieval method and model.By selecting data,optimizing model parameters and training many times,a high-quality semantic representation model of words is obtained,and multi-feature extraction algorithm is used to calculate multiple feature attributes of text,and on this basis,the establishment of domain knowledge base is realized.The related algorithms of text similarity are studied and implemented.The system uses continuous word bag model to calculate text similarity for retrieval.Based on this point,a domain-based document retrieval system is constructed and further used in practice.By comparing the experimental data,Text analysis based on semantics reflects more comprehensive text information,and the retrieval results of the system can be greatly improved by combining multi-feature extraction algorithm.According to the existing research results and the needs of text field researchers,a document retrieval system that domain information workers can manage and query data is designed,which can create independent data databases for different users.The system effectively isolates the databases,uses the functions of privilege management system,and provides support for the management update and retrieval of domain knowledge base.This system is developed with open source SSM framework technology.Users can manage personal databases and update them in real time.The system provides uploading,downloading,viewing,multi-feature extraction,document retrieval and other functions at present.The innovation of the system lies in the use of multi-feature calculation instead of single-feature calculation,which makes the text representation more comprehensive and accurate,this kind of method solves the problem of inaccurate calculation caused by incomplete information or too complex text in the process of text calculation to a certain extent.By using deep semantic training model,the erroneous judgment results caused by language ambiguity and word order changes are basically eliminated.In the stage of text similarity research,the traditional way of text representation is changed,and the way of text computation is changed from the perspective of semantics,which improves the accuracy and efficiency of text similarity calculation to a certain extent.
Keywords/Search Tags:Document retrieval, Multi-eigenvalue computation, Domain knowledge base, Word meaning transformation, Text Similarity Discrimination
PDF Full Text Request
Related items