Research On Mining And Retrieval Of Science And Technology Policy Resources In Multiple Fields And Disciplines

Posted on:2023-06-03

Degree:Master

Type:Thesis

Country:China

Candidate:B W Yu

Full Text:PDF

GTID:2568306914972799

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years,with the rapid growth of Internet data,the number and sorts of scientific and technological resources are also expanding rapidly.However,the growth of information data in number and categories also increases the cost of information acquisition.For science and technology enterprises or users,in addition to general papers,patents,and other contents,policies related to science and technology or the development of their industry should also belong to a kind of science and technology resources.However,the sources of such resources are complex and diverse,which increases the cost and difficulty of obtaining science and technology enterprises and users.Extracting valuable science and technology policy resources from a large number of mixed data and providing accurate and rapid retrieval will help to reduce the cost of information acquisition,which has profound social significance and social utility.The main work of this thesis includes the following aspects:(1)Because of the problems of wide sources,complex contents and structure of policy data in multi-domain and multi-disciplinary scenarios,this paper studies the acquisition method of multi-source policy field resource data,designs a general acquisition and information extraction method suitable for different data sources,and realizes the method of extracting text information from irrelevant page structure by integrating various features of policy page data,solve the problem of obtaining and processing policy resource data in multiple fields and disciplines.(2)In the face of the multi-domain and multi-disciplinary science and technology policy resources mined,realize and provide retrieval and query services.Methods the deep language model Bert was introduced to inject policy domain knowledge through domain pre training.The problem of input length limitation of Bert language model is solved by calculating the relevance and paragraph score aggregation in segments.Finally,the retrieval and sorting results are provided by integrating statistical relevance and semantic relevance.(3)After the multi-domain and multi-disciplinary science and technology policy resources mined,realize and provide retrieval and ranking services.The deep language model Bert was introduced,which injected policy domain knowledge through domain pre-training.The problem of input length limitation of Bert language model is solved by calculating the relevance in segments and aggregating paragraph scores.Finally,the retrieval and ranked results are provided by integrating statistical relevance and semantic relevance.

Keywords/Search Tags:

policy data, content extraction, class feature BoW, text classification, deep learning

PDF Full Text Request

Related items

1	Research On Network Text Sentiment Classification Based On Deep Learning
2	Research On Text Classification Based On Deep Learning
3	Research On News Text Classification Based On Deep Learning
4	Research On Classification And New Class Recognition Of Complaint Text In Business
5	Study On Key Techniques Of Text Content Classification And Topic Tracking
6	Classification Of News Short Text Based On Deep Learning
7	Short Text Classification Based On Feature Extension
8	Research On Text Representation Model And Deep Learning Algorithm In Text Classification
9	Research On Key Techniques For Class Imbalanced Data Classification
10	Research On Chinese Text Representation And Classification Based On Deep Learning