Font Size: a A A

Research And Application Of Intelligent Retrieval Technology For Data Asset Management

Posted on:2022-09-08Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y WangFull Text:PDF
GTID:2518306731498594Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the big data industry in recent years,the big data industry has risen to a national concern level,and the project has moved from application system construction to "data asset management" in the process of project construction.Potential value and try to realize it,using data value to create new business momentum has become the goal pursued by all walks of life."Data Asset Management" defines the process of cleaning,processing,and aggregating data,mining and using the potential value of data,and utilizing full value of data in business scenarios.As an important part of the project,information retrieval faces many problems that need to be solved urgently,such as inaccurate retrieval results,low efficiency of full-text retrieval,low interpretability of results,and inability to provide relevance feedback.This paper constructs an intelligent information retrieval subsystem based on a vertically distributed search engine and Learning to Rank algorithms,which improves retrieval efficiency while ensuring retrieval reliability.Main works are as follows:1)Construct a set of data import module based on vertical search engine According to the requirements of multi-source data storage and retrieval,starting from the relationship between metadata and its table entities,and based on the key technology of vertical distributed search engine,the functions of automatic index construction,automatic data synchronization and regular update are completed;2)Construct a set of signal amplification and filtering moduleAccording to the different retrieval request formats proposed by users,the word segmentation technology based on semantic understanding is used to analyze the retrieval request accurately,which provides support for the conversion of retrieval signal.At the same time,it provides a variety of granular word segmentation schemes for data storage and parsing;in view of the fact that the traditional retrieval mode in the process of multi field linked table query is prone to produce high deviation of retrieval results due to medium weight,it adopts the method of giving high weight to key fields based on correlation in the process of retrieval signal parsing to improve the retrieval recall rate;3)Construct a text classification model based on ranking learning algorithmAccording to the retrieval needs of the actual project,the corresponding ranking learning algorithm is selected for modeling,the traditional retrieval efficiency is compared horizontally,and the performance of different models is compared vertically.A text classification module is constructed to provide keyword highlighting and other functions on the basis of improving the retrieval accuracy;4)Propose a set of information retrieval metrics for data asset managementTraditional retrieval results are usually judged manually.This paper proposes a set of information retrieval metrics for data asset management by using BM25 and other methods.
Keywords/Search Tags:Data Asset Management, Information Retrieval, Learning to Rank, Relevance Search
PDF Full Text Request
Related items