Bedyakin, R., & Mikhaylovskiy, N. (2021, June). Language ID Prediction from Speech Using Self-Attentive Pooling. In Proceedings of the Third Workshop on Computational Typology and Multilingual NLP (pp. 130-135).

This memo describes NTR-TSU submission for SIGTYP 2021 Shared Task on predicting language IDs from speech. Spoken Language Identification (LID) is an important step in a multilingual Automated Speech Recognition (ASR) system pipeline. For many low-resource and endangered languages, only single-speaker recordings may be available, demanding a need for domain and speaker-invariant language ID systems. In this memo, we show that a convolutional neural network with a Self-Attentive Pooling layer shows promising results for the language identification task.