Font Size: a A A

Research On Automatic Terminology List Construction Of Documents

Posted on:2019-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:G X ChenFull Text:PDF
GTID:2428330590475369Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Automatic term extraction(ATE)is a method which obtained automatically domain terms from domain related texts.For a long time,study of term extraction method is one of the important research subject of natural language processing.With the rapid development of big data in various industries,the automatic term extraction method has brought great help to many research fields,such as machine reading,information retrieval,computer aided translation and so on.Traditional automatic term extraction method usually includes methods based on word frequency and on context or other types,but the existing term extraction method usually ignore the "key words" characteristic of the term,which consider term extraction and keyword extraction as two independent areas.In addition,the exist ATE methods do not effectively utilize a large number of domain-related texts.Against problems existing in the term extraction method,this thesis puts forward the term extraction method which include two phases: method based on keyword extraction and method based on distant supervision,and implements an automatic term extraction system.The primary study in this thesis are described as the following parts:(1)Propose a term method based on keyword extraction.This method builds the text into a text graph and analyzes the semantic relationship between words or phrases.Through the iterative algorithm,the key words in the text are selected as the candidate terms set.Then,the term candidates set with weighted is extracted by the keyword method.(2)Propose a term extraction method based on distant supervision.This method uses Wikipedia as corpus,analyzes the Wikipedia link structure and anchor text,builds a term set from Wikipedia.Finally,this method integrates the keyword extraction method to improve the precision and recall of the term extraction.(3)Design and implement an automatic term extraction system based on the above methods.The system supports the local or user-uploaded text corpus and renders the results to the user after automatically extracting the terms in the text.In this thesis,the validity of the term extraction method is verified by experiments.The method and realization system of this thesis is of great significance to the research and practice of terms extraction.
Keywords/Search Tags:Automatic Term Extraction, Keyword Extraction, Distant Supervision, Wikipedia
PDF Full Text Request
Related items