| With the rapid development of the Internet,massive data and services have become accessible.While bringing people a lot of convenience,the phenomenon of information redundancy and information overload has become increasingly serious.The text summarization technology emerges at the historic moment.Nowadays,automatic text summarization technology has been applied to many fields,among which the meeting summarization has a high research value,it can help save a lot of time for people,improve work efficiency.Meeting summarization is a kind of dialogue summarization.At the same time,due to the privacy of the meeting,there are few training data,which is difficult to support large-scale training,so the study of the meeting summarization often relies on the pre-training language model in other fields.However,fine-tuning directly on this basis can not achieve the desired effect,because the meeting has the characteristics of excessively long text,role interaction,topic jump and so on,which is often different from the data distribution in the source domain.In addition,the current research mainly focuses on the general text summarization,but a meeting often involves multiple topics,different roles have different division of labor,usually need to pay attention to different parts of the meeting;Under different time budgets,people have different requirements for meeting summaries.Therefore,generic text summaries lack control mechanisms for users to customize summaries according to their specific needs and preferences.Based on the above problems,the research content of this paper is summarized as follows:1.In view of the shortage of meeting training resources,news pretraining language model is adopted,and in view of the huge difference between news text and dialogue text,adversarial domain adaptation training is used to make up for the performance loss caused by extraterritorial pre-training and intra-domain fine-tuning.To be specific,We used the text summarization task and adversarial domain discriminator to conduct secondary pre-training on the news and dialogue datasets to narrow the distance between the source domain(news)and the conversation domain,and then fine-tuned on the meeting summarization dataset QMSum.Experiments show that our method has high performance.2.In view of users’ demand for customizing summaries according to their interests,a query-based meeting summarization is proposed to help users extract the key information of a specific topic in the meeting,and a mechanism combining query correlation and summarization model is innovatively proposed,that is,firstly,the correlation model is used to calculate the degree of correlation between each round of speeches and input queries in the meeting.This correlation score is then considered in the process of decoding the meeting summarization model.Our proposed mechanism achieves competitive performance.3.In view of the user’s need to customize the summarization according to the detailed needs,the detailed and slightly controllable meeting summarization is put forward.This paper improves the existing research on the length controllable summarization to achieve this requirement.This paper proposes the CARL mechanism to achieve length control,that is,each time step during decoding adjusts the cross attention probability matrix between encoder and decoder according to the expected residual length information.While achieving the summarization length control,It also improves the information selection ability of the model and generates high quality detailed and slightly controllable summaries. |