Font Size: a A A

Research On Representation Learning And Semantic Understanding Of Chinese Idioms Based On Deep Learning

Posted on:2022-11-04Degree:MasterType:Thesis
Country:ChinaCandidate:H TanFull Text:PDF
GTID:2518306782952049Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Originated from ancient Chinese fables,historical stories,myths and legends or oral communication,Chinese idioms are often used in daily language expressions because of the advantages of vividness and refinement.Generally speaking,idioms are typically composed of four characters according to the grammar of ancient Chinese.Therefore,the semantics contained in short four-character idioms are often richer than those of paragraph sentences with dozens or hundreds of characters,which can not be directly explained by modern Chinese.At present,the research on Chinese idioms in the field of natural language processing is still in the initial stage,such as idiom machine reading comprehension and idiom recommendation.However,the research on the semantic understanding of Chinese idioms has been slow to develop owing to the lack of large-scale and high-quality Chinese idiom semantic corpus.Taking the sentences consisting of modern Chinese and idioms as the research object,this thesis will conduct an in-depth discussion,in the aspects of representation learning and semantic understanding of Chinese idioms,on the issue of how the machine analyzes the semantic and syntactic information inherent in the sentences comprising of idioms and modern Chinese,and how the machine represents and learns idioms.The main contributions of this thesis are as follows:(1)We constructed a large-scale,high-quality Chinese idiom semantic corpus,named CHISC,and present the corresponding construction method and procedure of corpus.(2)Four training strategies are proposed for the representation learning of Chinese idioms,and experiments and semantic effect evaluation are performed on the constructed corpus CHISC.Performance merits with respect to multiple similarity distances and two correlation coefficients are employed to evaluate the pros and cons of our proposed four training strategies,thereby verifying their feasibility for the problem for idiom representation learning.(3)In order to obtain the deep semantic information of Chinese idioms,this thesis addresses two Chinese idiom recommendation models based on deep learning technology,namely word interaction based Chinese idiom recommendation model IdmRep-CW and deep idiom representation based Chinese idiom recommendation model Deep-IdmRep.In IdmRep-CW,the idiom embedding in the character level and the idiom embedding in the word level are fused by the word interaction module to obtain the grammatical information of the internal structure of the idiom.In the Deep-IdmRep model,the co-occurrence concept of idioms and paragraphs is used for contextual joint interaction,and the semantic information of paragraphs is integrated into the surface semantic information of idioms,so as to obtain the deep semantic representation of idioms.In addition,Deep-IdmRep also incorporates a dynamic augmentation strategy that uses random noise to extract negative sample idioms to expand idiom candidates,so as to improve the performance and generalization ability of the model.To demonstrate the effectiveness of the proposed idiom recommendation model for idiom semantic understanding,this thesis conducts multiple sets of experiments on the large-scale Chinese idiom dataset ChID.The results show that the proposed idiom recommendation model performs better than the pure idiom matching model.Compared with LM model,AR model,BiLSTM based SAR model,our method achieves performance improvement by nearly 5%on validation set and standard test set.Besides,we compare the performance between the strategy of static extension candidates and dynamic extension candidates.Experimental results show that the dynamic extension candidates can offer the comparable performance with the static extension candidates at a minimal computational cost.The effectiveness and importance of each component in the idiom recommendation model are verified by ablation experiments and visual analysis.
Keywords/Search Tags:Chinese idioms, Representation Learning, Semantic Understanding, Idiom Recommendation, Dynamic Extension Candidates
PDF Full Text Request
Related items