Borisov E., Mikhaylovskiy N., 2023. Automated Minuting on DumSum Dataset. In Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2023”, Moscow.

Meeting minutes are short texts summarizing the most important outcomes of a meeting. The goal of this work is to develop a module for automatic generation of meeting minutes based on a meeting transcript text produced by an Automated Speech Recognition (ASR) system. We consider minuting as a supervised machine learning task on pairs of texts: the transcript of the meeting and its minutes. No Russian minuting dataset was previously available. To fill this gap we present DumSum — a dataset of meetings transcripts of the Russian State Duma and City Dumas, complete with minutes. We use a two-staged minuting pipeline, and introduce semantic segmentation that improves ROUGE and BERTScore metrics of minutes on City Dumas meetings by 1%-10% compared to naive segmentation.