Fakulta matematiky, fyziky
a informatiky
Univerzita Komenského v Bratislave

Doktorandské kolokvium KAI - Jozef Kubík (21.10.2024)

v pondelok 21.10.2024 o 13:10 hod. v miestnosti I/9


16. 10. 2024 16.13 hod.
Od: Damas Gruska

Prednášajúci: Jozef Kubík 

Názov: Active learning for Bert models in Slovak language

Termín: 21.10.2024, 13:10 hod., I/9


Abstrakt:
Large language models pretrained on (general) language data are often used for many different specific text classification tasks. Success in these tasks depends not only on pre-training on a large amount of raw data, but also on fine-tuning on a smaller amount of (usually hand-annotated) task-relevant data. Popular languages such as English or French usually do not have problems with data availability. On the other hand, low-resource languages, such as Slovak, suffer from the lack of training data in general. This is even more pronounced for fine-tuning. Manual annotating is not only expensive and time-consuming, but it also depends on how many people can actually annotate data in a given language.

In our work, we try to experiment with the fine-tuning process of models pre-trained in low-resource language (Slovak). The goal is to improve these models to produce similar or better results while reducing the amount of annotated data needed in the fine-tuning process. For this, we connect the fine-tuning process to different functions from Active learning paradigm, in which the model itself tries to tell us which data should be preferentially annotated, thus reducing the amount and cost of manual annotation.

Stránka seminára