| Sense identification,or the task of differentiating the distinct meanings of a lemma for its dictionary definition constitutes an important part of dictionary compilation.The difficulty in sense identification primarily originates from two fundamental sources:the elusive boundaries between distinct meanings and the demanding task of examining great amount of language data to construct a comprehensive and accurate meaning profile.This dissertation experiments with a new methodology for sense identification assisted by contextual embeddings from Large Language Models(LLMs)to address the challenges while exploring the interrelationships between context and meaning.Four highly frequent and polysemous words-WAY,RUN,OVER,and THATwere selected as target objects of our case studies.The pre-trained BERT model and its fine-tuned variant created specifically for this study,BERT-based Example Classifier for Sense Identification in Contexts(BECSICon),were employed to extract the contextual embeddings of the concordances.The outcomes of automatic concordance classification based on these embeddings are complemented by the Extended Unit of Meaning(EUM)analysis,which serves as a framework for interpreting as well as evaluating the classification results.The dissertation aims to address the following question:To what extent do contextual word embeddings facilitate word sense identification in learners’ dictionary compilation?This central query is approached via two sub-questions regarding the viability of the method when applied to different classes of words and to the identification of different types of contextual features.Our case studies indicate that LLMs,in general,exhibit proficiency in effectively classifying concordances of a target word,clustering instances of similar meanings together and partitioning instances denoting different meanings into separate clusters.This automatic preprocessing of corpus data can significantly reduce the workload of lexicographers.After the EUM analysis,the identified senses for each target word align with those documented in major learners’ dictionaries in a comparable manner.In terms of the variations in models’ performance,no significant correlation is discerned between the concreteness of word meanings and the quality of classification results.Instead,the models demonstrate varying degrees of competence in handling meanings associated with distinct types of contextual features.In light of these findings,it is postulated that contextual features manifested as colligations and semantic preferences entail the categorization of collocations based on grammatical and ontological knowledge,respectively.Semantic prosodies,on the other hand,are usually not explicitly manifested in collocations.In contrast,they are instantiated in what is termed situational preferences in this dissertation,whose identification relies more heavily on world knowledge and interpersonal understanding.Consequently,the more abstract a feature is in relation to collocations,the greater the challenge for an LLM to detect it during concordance processing.This challenge arises because LLMs lack access to external knowledge sources,relying solely on collocational patterns for conceptualizing and identifying contextual features.The contextual embedding approach can be readily generalized to sense identification in other words and other languages so that its validity can be comprehensively evaluated.In addition,the integration of LLMs in linguistic studies in general holds promise,offering fresh perspectives on linguistic evidence that may advance our understanding of language and meaning. |