Font Size: a A A

Abstract Sentence Classification And Frequent Pattern Mining For Scientific Papers Oriented To An English Writing Assistant System

Posted on:2014-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:F WuFull Text:PDF
GTID:2268330422950627Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet techniques and the increasinginternationalization of academic, writing a scientific paper in English and publishingit has become an essential skill for every research specialist staff and high-techtalent. For non-English speaking people, both strong English writing abilities andlarge amounts of knowledge in related fields are required to write a high-levelscientific paper in English. However, not all of them are capable enough to do so. Awriting assistant system for writing scientific papers in English can help compensatefor the lack. Thus the building of the corpus is the key factor to support the system.The purpose of this paper is to develop and improve the corpus for an Englishwriting assistant system, especially for writing scientific papers. The study object ofthis work is abstract sentences in scientific papers, the main work is as follows.First, we download a lot of scientific papers from the web pages on the Internetand extract the abstract sentences out of them, and then store them together toconstruct a corpus with the sentence-level units. We do some research about thestructure and organization of the abstract in English scientific papers. By markingsome instances of abstract sentences and making some mathematical statistics ofthem, we have a brief understanding of the characteristic of abstract sentences.Second, we classify these abstract sentences into for categories, the“background”, the “goal”, the “method” and the “conclusion”. In the experiments,we choose the supervised machine learning method Support Vector Machine as theclassification model. We also carry out a series of experiments on feature selectionfor abstract sentences to overcome the shortcoming of sparse feature. And finally,the accuracy of the classification is improved.Third, we separately carry on a frequent pattern mining of the classifiedabstract sentences in each class in order to develop and improve the knowledge baseof the English writing assistant system in abstract writing. In the experiment, we usethe classic method FP-growth algorithm to mine the frequent patterns in the set ofabstract sentences for each class. By improving the mining strategy step by step, weget better frequent patterns in each class and improve the mining results.
Keywords/Search Tags:English writing assistant, abstract of scientific papers, building of the corpus, abstract sentences classification, frequent pattern mining
PDF Full Text Request
Related items