Font Size: a A A

Research On Distributed Retrieval For Science And Technology Cross Media Data

Posted on:2022-07-24Degree:MasterType:Thesis
Country:ChinaCandidate:L C HaoFull Text:PDF
GTID:2518306341952199Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the increasing abundance of scientific and technological information on the Internet,the integration of different disciplines has given rise to new research directions.Scientific researchers share their research results by publishing papers,and they will also search other papers.For scientific researchers,knowing the discipline news and hot research topics is conducive to their exploration of different research topics.Science and technology resources include not only text information such as papers,but also data of different media such as images.These data contain the unique attributes.There are relationships among keywords,disciplines and research topics.However,traditional methods can not do a good job in searching for these information.Based on neural network,clustering algorithm,distributed search engine and other technologies,this thesis extracts text features and image features of cross media information of science and technology resources.According to the mapping of keywords,disciplines and research topics,it shows the interdisciplinary relationship,predicts the trend of disciplines and research topics,and compares the prediction results with the main research topics.The influence of disciplines is used as a factor to influence the ranking results of the query.The work of this thesis can be divided into the following four points.(1)According to the characteristics of each attribute in the cross media information of science and technology resources,a feature extraction algorithm based on the discipline characteristics of science and technology resources is proposed.First of all,using Scrapy crawler,the cross media data set of science and technology resources is obtained,and the total amount of data is more than 150000.This thesis carries out preprocessing operations for different sources of data.In the aspect of text feature extraction,the text vectorization is realized by BERT model.For the image of science and technology resources,the image features are obtained by constructing 13 convolutional layers,and disciplines and research topics in these resources are further obtained by clustering algorithm.(2)The method of cross media information relationship discovery and evolution analysis of science and technology resources is proposed.According to the characteristics of the obtained science and technology resources,this thesis studies the relationship between disciplines and research topics,and constructs the overall system of science and technology resources interdisciplinary.According to the time sequence characteristics of science and technology resources,this thesis puts forward a prediction algorithm for the evolution of science and technology resources and research topics.The algorithm introduces the convolution layer for training and learning,and finally achieves the effect of predicting the development trend of research topics under the discipline.The prediction result will be an important factor in the retrieval and query stage and affect the final display result of the system.(3)A retrieval algorithm based on subject relationship and influence is proposed,which combines the cross media information features of science and technology resources.The influence of research topics is quantified by defining the influence index of research topics for science and technology resources.According to the results obtained in the feature extraction stage and relationship discovery stage,the relationship between subjects and research topics in the retrieval is clarified.When ranking the retrieval results,in addition to considering the interdisciplinary relationship,the influence index and trend prediction results are added as factors,and finally the results that can effectively express the development of science and technology resources disciplines are obtained.Based on Elasticsearch distributed search engine and Redis cache technology,the response speed of the system in the retrieval query is improved.In addition,according to the actual application scenarios,this thesis considers the needs of users.Combined with visualization method,this thesis refines the system logic,provides good interaction,and shows clear results for users.(4)The distributed retrieval system of science and technology cross media data is completed and implemented.Different functions are described and displayed by visual diagram.The system is divided into three functional modules:discipline relationship discovery in science and technology resources,cross media information retrieval and query in science and technology resources,and discipline and research topic evolution analysis in science and technology resources.The performance of the system and the correctness of the module function are verified by testing.This thesis describes the acquisition and data preprocessing of cross media information,the feature extraction of cross media information,the discovery and evolution analysis of discipline relationship,the cross media information retrieval query and visualization,and finally completes the distributed retrieval system of cross media information of science and technology resources,which can realize the retrieval of interdisciplinary resources.The system has good interaction and practical value.
Keywords/Search Tags:cross media, interdisciplinary, deep learning, visualization, distributed retrieval
PDF Full Text Request
Related items