Font Size: a A A

Research On Terminology Extraction Of Academic Paper Based On Multi-Strategy Method

Posted on:2017-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:D JiangFull Text:PDF
GTID:2348330503489803Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
How to extract term quickly and accurately is an important issue of natural language processing. The term extraction of academic papers can promote the development of science and the spread of scientific achievements efficiently. In academic papers, terms at different locations have different characters, such as title, keywords and abstract. However, the traditional term extraction methods have no concern for this information. Therefore, there has been a need for a new method that takes the term location into consideration to make up for the shortage of the prior way.A term extraction method of academic papers based on multi-strategy has been proposed, called TEM. In accordance with the different features of keywords, title and abstract, this method used different strategies, which include the strategy based on keywords, the strategy based on boundary makers and the strategy based on Chinese term formation rules. Then, the error types of candidate term extraction have been analyzed and a counter-example rules dictionary was imported to improve the effects of extraction results. K-Near frequency substring merge algorithm was used to filter candidate terms. Finally, based on the candidate terms' locations, a comprehensive grade method was proposed, it used AHP to decide the weight of the keywords, titles and abstracts, and the terms were sorted by the final score. In addition, the class probability was introduced to the traditional algorithm named TF-IDF, and improved the precision of single-word term extraction.In the experiments, the influence of K value in substring merge was tested, and the optimal solution was found by considering the reduction rate and the false-positive rate. This paper compared the test results of TF-IDF, TF-IDF-CF and TEM-SW for single-word terms, and the SCP-CV and TEM-MW for multi-word terms. The test results show, compared to traditional methods, TEM-SW improved the precision and the recall by 7.85% and 11.54%; TEM-MW improved the precision and the recall by 11.62% and the 9.71%. Therefore, TEM has better performance than traditional ways.
Keywords/Search Tags:Multi-Strategy, Term extraction, Substring reduction, Unithood, Termhood, AHP
PDF Full Text Request
Related items