Font Size: a A A

Research On Multi-modal Word Segmentation Method Integrating Speech Features

Posted on:2023-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:S X LiuFull Text:PDF
GTID:2558306845990989Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Chinese word segmentation is the process of dividing Chinese strings into word sequences.It is a basic and important task in the field of natural language processing.Improving the effect of Chinese word segmentation will help to promote the development of subsequent downstream tasks.The current word segmentation methods pay more attention to the text form,ignoring the expression of some audio and video forms in real life.However,the existence of Chinese ambiguity makes it difficult to infer the final word segmentation result only by relying on the text.Therefore,some researchers have proposed a multi-modal word segmentation method.Multi-modal word segmentation integrates the speech information on the basis that the previous word segmentation methods only consider the text information,and carries out Chinese word segmentation with the help of the speech information that can provide accurate word segmentation clues,which effectively solves the ambiguity problem in Chinese word segmentation and further improves the performance of Chinese word segmentation.However,the current multi-modal word segmentation methods still have the following problems: First,the existing multi-modal word segmentation methods only consider the loss after modal fusion,and do not make full use of the information of single modal.When there is a deviation in modal fusion,it can not correct the impact on word segmentation performance.Second,in the modal fusion,the features of the two modals are directly spliced,and then the features are extracted through the attention mechanism as the features after the final modal fusion,without considering the mutual adaptation between the two different modal features,that is,cross modal interaction.In view of the above problems,this paper mainly does two aspects of research.On the one hand,this paper studies the impact of adding the loss of Chinese word segmentation on the performance of multi-modal word segmentation model.On the other hand,the modal fusion mechanism in multi-modal word segmentation method is improved.The influence of the fusion method of stitching the features of different modes after cross modal interaction on the effect of Chinese word segmentation is studied.The main contributions of this paper are as follows:(1)An improved scheme based on joint loss is proposed.Before modal fusion,a conditional random field is added after the features of speech and text modals to predict the segmentation results.The final loss of the model adds the loss caused by Chinese word segmentation on two single modals.In order to study the influence of the loss proportion of each part on the word segmentation effect of the model,this paper also sets a certain weight for different losses.Compared with the existing multi-modal word segmentation methods,this method also considers the loss of Chinese word segmentation before and after modal fusion.Experiments show that the improved scheme based on joint loss is better than the baseline model in recall rate and F1 score.(2)Based on the improved scheme based on joint loss,an improved method of modal fusion based on cross modal attention mechanism is proposed.The features of all character steps in each modal are spliced into a whole,and the sequence features are obtained as the input of cross modal attention.The features after text to speech adaptation and speech to text adaptation are obtained through cross modal attention.Finally,the two features are spliced together as the feature input condition random field after modal fusion to predict the results of Chinese word segmentation.This method considers the interaction of cross modal information in modal fusion.Experiments show that the segmentation result of this method is higher in accuracy and F1 score than the weight scheme based on joint loss improvement to obtain the highest F1 score.
Keywords/Search Tags:Chinese word segmentation, Natural language processing, Multi-modal, Joint losses, Cross modal attention mechanism
PDF Full Text Request
Related items