The Contextual Embedding Approach To Word Sense Identification For Learners’Dictionary Compilation

Posted on:2024-02-17

Degree:Doctor

Type:Dissertation

Country:China

Candidate:C Zhao

Full Text:PDF

GTID:1525307202988219

Subject:Foreign Linguistics and Applied Linguistics

Abstract/Summary:

PDF Full Text Request

Sense identification,or the task of differentiating the distinct meanings of a lemma for its dictionary definition constitutes an important part of dictionary compilation.The difficulty in sense identification primarily originates from two fundamental sources:the elusive boundaries between distinct meanings and the demanding task of examining great amount of language data to construct a comprehensive and accurate meaning profile.This dissertation experiments with a new methodology for sense identification assisted by contextual embeddings from Large Language Models(LLMs)to address the challenges while exploring the interrelationships between context and meaning.Four highly frequent and polysemous words-WAY,RUN,OVER,and THATwere selected as target objects of our case studies.The pre-trained BERT model and its fine-tuned variant created specifically for this study,BERT-based Example Classifier for Sense Identification in Contexts(BECSICon),were employed to extract the contextual embeddings of the concordances.The outcomes of automatic concordance classification based on these embeddings are complemented by the Extended Unit of Meaning(EUM)analysis,which serves as a framework for interpreting as well as evaluating the classification results.The dissertation aims to address the following question:To what extent do contextual word embeddings facilitate word sense identification in learners’ dictionary compilation?This central query is approached via two sub-questions regarding the viability of the method when applied to different classes of words and to the identification of different types of contextual features.Our case studies indicate that LLMs,in general,exhibit proficiency in effectively classifying concordances of a target word,clustering instances of similar meanings together and partitioning instances denoting different meanings into separate clusters.This automatic preprocessing of corpus data can significantly reduce the workload of lexicographers.After the EUM analysis,the identified senses for each target word align with those documented in major learners’ dictionaries in a comparable manner.In terms of the variations in models’ performance,no significant correlation is discerned between the concreteness of word meanings and the quality of classification results.Instead,the models demonstrate varying degrees of competence in handling meanings associated with distinct types of contextual features.In light of these findings,it is postulated that contextual features manifested as colligations and semantic preferences entail the categorization of collocations based on grammatical and ontological knowledge,respectively.Semantic prosodies,on the other hand,are usually not explicitly manifested in collocations.In contrast,they are instantiated in what is termed situational preferences in this dissertation,whose identification relies more heavily on world knowledge and interpersonal understanding.Consequently,the more abstract a feature is in relation to collocations,the greater the challenge for an LLM to detect it during concordance processing.This challenge arises because LLMs lack access to external knowledge sources,relying solely on collocational patterns for conceptualizing and identifying contextual features.The contextual embedding approach can be readily generalized to sense identification in other words and other languages so that its validity can be comprehensively evaluated.In addition,the integration of LLMs in linguistic studies in general holds promise,offering fresh perspectives on linguistic evidence that may advance our understanding of language and meaning.

Keywords/Search Tags:

corpus linguistics, Large Language Models, contextual word embeddings, sense identification, learners’ dictionary

PDF Full Text Request

Related items

1	Discussion And Supplement On A Large Dictionary Of Early-modern Chinese
2	Research On Lexical Level Knowledge Mining Based On Corpus
3	A Study On The Construction Of A Large-scale And General Corpus For Chinese Dictionary Compiling
4	Convolutional Neural Network For Sentiment Classification Based On Sentiment Special Word Embeddings
5	A Study On Corpus-based Compilation Of English Dictionary For Junior Secondary School Learners In China
6	A Corpus-based Research On Chinese EFL Learners' Synonyms Learning
7	A Study On Chinese Glossary Compilation In Electronic Dictionary For Beginning Chinese Learners
8	Research On Metaphors In "Modern Chinese Dictionary" (7th Edition)
9	Fine-grain Word Sense Disambiguation Of English Modal Verb Might And Interaction Between Contextual Features
10	Study On Collocation In Gene Related News Based On Corpus Linguistics