Font Size: a A A

An Information Extraction System Used To Describe Scholar Portraits

Posted on:2021-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:W LiFull Text:PDF
GTID:2518306557994159Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Industry-university-research cooperation has promoted the development of science and technology and industry,but in the process of promoting industry-university-research cooperation,it is difficult for enterprises or regional governments to obtain relevant information from scholars and cannot conduct effective dialogues.The main reason is that the information of scholars in the field is scattered and it is difficult to obtain information.There are data describing scholars' information on the Internet.This type of data can be aggregated and processed through data collection and information extraction to form streamlined scholars' description information and provide assistance to enterprises and regional governments.This paper designs and implements an information extraction system used to describe scholars ' portraits.The purpose is to aggregate data from the Internet into information describing scholars,provide data support to the data platform,and display it to enterprises or governments,so that they can be certain to scholars.Understanding.Different from the existing software system that processes scholar data,this system pays more attention to the relevant information of scholars in the background of industry,university and research,and compares and analyzes data from multiple sources,such as academician homepage,school official website homepage,baidu-pedia,etc.,Get the scholar's basic information,administrative title,honor information and other information.The main work of this paper is as follows:(1)The method of discovering and extracting scholar homepages is proposed.A method of searching and identifying the homepage based on a search engine is proposed to obtain scholar homepage data,which improves the efficiency of data collection under the condition of ensuring accuracy.At the same time,a scholar portrait extraction method based on text segmentation and entity recognition technology is proposed,which improves the accuracy of homepage extraction.(2)Proposed the subject field term extraction based on academic literature data.C-Value and mutual information are used as important extraction indexes to calculate indexes for words in academic literature data.In order to expand the new terminology,this article also proposes an algorithm for automatically generating the part-of-speech rules of phrases,which improves the extraction rate of terms.(3)Designed and implemented an information extraction system for scholar portraits.By analyzing the portrait model of scholars under the background of production,education and research,defining the extraction target range,designing extraction strategies,and realizing an information extraction system.The system can effectively complete the homepage extraction and term extraction tasks.Through the analysis of the accuracy of the extraction results,the results are in line with expectations,verifying the usability and effectiveness of the system.
Keywords/Search Tags:Scholar Portrait, Information Extraction, Entity Recognition, C-Value
PDF Full Text Request
Related items