Bolshakov V., Kolobov R., Borisov E., Mikhaylovskiy N., Mukhtarova G., 2023. Scaled Down Lean BERT-like Language Models for Anaphora Resolution and Beyond. In Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2023”, Moscow.

We study performance of BERT-like distributive semantic language models on anaphora resolution and related
tasks with the purpose of selecting a model for on-device inference. We have found that lean (narrow and deep)
language models provide the best balance of speed and quality for word-level tasks, and opensource1 RuLUKE-tiny
and RuLUKE-slim models we have trained. Both are significantly (over 27%) faster than models with comparable
accuracy. We hypothesise that the model depth may play a critical role for performance as, according to recent
findings each layer behaves as a gradient descent step in autoregressive setting.