A Comparative Analysis Of Approaches To Automatic Collocation Extraction

Posted on:2012-01-04

Degree:Master

Type:Thesis

Country:China

Candidate:X Zhu

Full Text:PDF

GTID:2155330335959527

Subject:Foreign Linguistics and Applied Linguistics

Abstract/Summary:

PDF Full Text Request

Collocations are important resources for second language learning and many natural language processing tasks, but automatic extraction of collocations from a corpus has always been a well known problem.Corpus-based automatic extraction of collocations is typically carried out by employing some kind of a statistical measure that indicates weather or not two words occur together more often than by chance. But when extracting collocations from a corpus, linguists always choose one of the methods randomly without concerning of the size of the corpora or the category of the corpora which always causes the deficiency in collocation extraction. In this paper an attempt has been made to evaluate the extraction efficiency of four kinds of algorithms (mutual information, chi-square test, t-test and log-likelihood ratio). Specifically, this study intends to study the following two questions:(1) For corpus of the same size but of different categories, whether there is any difference among the extraction efficiency of the four algorithms.(2) For corpus of the same category but of different sizes, whether there is any difference among the extraction efficiency of the four algorithms.The result reveals that:(1) For the corpora with the same size of two million words: the overall best result was achieved by mutual information for the academic corpus and press corpus; while for the fiction corpus, log-likelihood ratio performed the best.(2) For the corpora with the same category:the overall best results were achieved by log-likelihood ratio when the size of press corpus is smaller than one million words; but for the press corpus with the size larger than one million words, the overall best results were achieved by mutual information.

Keywords/Search Tags:

collocation extraction, algorithms, corpus

PDF Full Text Request

Related items

1	A Corpus-based Study Of Collocation Patterns And Motivations In English-Chinese Translated Movie Scripts
2	A Corpus-based Study Of Verb/Noun Collocation Behaviors Of Chinese College Students
3	Collocation Extraction And The Collocational Features Of Verbs In China English
4	A Corpus-based Study On Chinese EFL Learners' Verb Collocation
5	An Empirical Study On Corpus-Based Collocation Teaching In Senior High Schools
6	A Corpus-Based Study On Adjective-Noun Collocation Errors In Chinese English Majors' Writing
7	A Corpus-based Study On The Collocation Behavior Of Have By Chinese English-majors
8	A Corpus-based Study Of The Collocation Use Of The English Verb--get
9	A Corpus-driven Study On Collocation Errors
10	Research On Skeleton Extraction Algorithms Of Calligraphic Characters Based On Deep Learning