Natural Language Processing Aiming To The Core Texts Of Scientific Literature

Posted on:2015-09-26

Degree:Master

Type:Thesis

Country:China

Candidate:J M Shao

Full Text:PDF

GTID:2298330431978645

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Internet technology develops rapidly now, all kinds of information emerge in endlessly,how we extract the information we need in the numerous information become a hot issue thatpeople pay attention to. The way of academic exchanging for scholars is not only limited tothe written form, more and more academic information spread by the form of web pages,professional database about academic research arises at the historic moment. Itâ€™s very difficultto obtain information that users need from huge databases, it needs manual operation for thedeep processing of literature information and knowledge discovery, which brought greatinconvenience for the scientific research personnel. This paper carried out the naturallanguage processing research of literature study to solve the above problem.Scientific literature reflects peopleâ€™s level of science and technology, scientific researchand found, represents peopleâ€™s knowledge level in a certain period, embodies thedevelopments in the professional field, is an essential knowledge resource for the researchersacquiring knowledge. Extracting scientific research literature information and getting moremeaningful data is a great help to research the academic research of scholars according to thedifferent needs of scientific research personnelThis paper studied the main key technology of Web page crawl, and designed a set ofC/S mode crawl tools, based on a analysis of the pages of a foreign language literaturedatabase. This article studied several kinds of Web information extraction technology,analyzed the structure characteristics of Web page HTML in detail, designed a set of Webinformation extraction templates making use of the similarity of Web structure in samenetwork database to extract the important key information of the literature, eventually provedthe accuracy of template is higher. After acquiring the metadata of the literature, the followingstudy is the data statistical analysis, first of all, the article studied the common text clusteringalgorithms in-depth, and then determined the clustering algorithm of this paper combining thecharacteristics of this text. After the text pretreatment such as stem processing, established thetext vector space model to clustering analysis, the whole process is divided into three steps:selection of keywords and weighted, similarity calculation, the selection and implementationof clustering algorithm. Selection of the characteristic words and weighted is the core problem, this paper put forward to key weighted way based on the position of characteristic wordssummarizing predecessorsâ€™ experience in the weighted method. Finally, counting classifiedinformation after the clustering, the article obtained the reference data information forresearch scholars academic study.This topic carried out the natural language processing of research literature, whichbrings a high reference value for academic research to the scientific research workers,provides a new solution for literature writing of scientific research workers, understanding thelatest developments in scientific research.

Keywords/Search Tags:

Natural language processing Web crawl, Web information extraction, Clustering processing

PDF Full Text Request

Related items

1	Research On Machine Learning For Natural Language Processing And Transmission
2	Design And Implementation Of The Information Processing System Of Safety Accidents Based On Understanding Natural Language
3	Design And Implementation Of Knowledge Extraction Algorithm Based On Natural Language Processing
4	The Application Of Natural Language Processing In Mining The Characteristics Of Concept Convey
5	The Design And Implementation Of Legal Service System Based On Natural Language Processing
6	Research On Intelligent Retrieval Of Patent Infringement Based On Natural Language Processing
7	Narrative Information Extraction with Non-Linear Natural Language Processing Pipeline
8	Research On High Risk Information Processing Module Of Internet Public Opinion Based On Natural Language Processing
9	Research And Application On Chinese Topic Event Extraction
10	Research On Search Engine Oriented Natural Language Processing Technology