Font Size: a A A

Natural Language Processing Aiming To The Core Texts Of Scientific Literature

Posted on:2015-09-26Degree:MasterType:Thesis
Country:ChinaCandidate:J M ShaoFull Text:PDF
GTID:2298330431978645Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Internet technology develops rapidly now, all kinds of information emerge in endlessly,how we extract the information we need in the numerous information become a hot issue thatpeople pay attention to. The way of academic exchanging for scholars is not only limited tothe written form, more and more academic information spread by the form of web pages,professional database about academic research arises at the historic moment. It’s very difficultto obtain information that users need from huge databases, it needs manual operation for thedeep processing of literature information and knowledge discovery, which brought greatinconvenience for the scientific research personnel. This paper carried out the naturallanguage processing research of literature study to solve the above problem.Scientific literature reflects people’s level of science and technology, scientific researchand found, represents people’s knowledge level in a certain period, embodies thedevelopments in the professional field, is an essential knowledge resource for the researchersacquiring knowledge. Extracting scientific research literature information and getting moremeaningful data is a great help to research the academic research of scholars according to thedifferent needs of scientific research personnelThis paper studied the main key technology of Web page crawl, and designed a set ofC/S mode crawl tools, based on a analysis of the pages of a foreign language literaturedatabase. This article studied several kinds of Web information extraction technology,analyzed the structure characteristics of Web page HTML in detail, designed a set of Webinformation extraction templates making use of the similarity of Web structure in samenetwork database to extract the important key information of the literature, eventually provedthe accuracy of template is higher. After acquiring the metadata of the literature, the followingstudy is the data statistical analysis, first of all, the article studied the common text clusteringalgorithms in-depth, and then determined the clustering algorithm of this paper combining thecharacteristics of this text. After the text pretreatment such as stem processing, established thetext vector space model to clustering analysis, the whole process is divided into three steps:selection of keywords and weighted, similarity calculation, the selection and implementationof clustering algorithm. Selection of the characteristic words and weighted is the core problem, this paper put forward to key weighted way based on the position of characteristic wordssummarizing predecessors’ experience in the weighted method. Finally, counting classifiedinformation after the clustering, the article obtained the reference data information forresearch scholars academic study.This topic carried out the natural language processing of research literature, whichbrings a high reference value for academic research to the scientific research workers,provides a new solution for literature writing of scientific research workers, understanding thelatest developments in scientific research.
Keywords/Search Tags:Natural language processing Web crawl, Web information extraction, Clustering processing
PDF Full Text Request
Related items