Unified benchmark for zero-shot Turkish text classification
Küçük Resim Yok
Tarih
2023
Yazarlar
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Elsevier Sci Ltd
Erişim Hakkı
info:eu-repo/semantics/closedAccess
Özet
Effective learning schemes such as fine-tuning, zero-shot, and few-shot learning, have been widely used to obtain considerable performance with only a handful of annotated training data. In this paper, we presented a unified benchmark to facilitate the problem of zeroshot text classification in Turkish. For this purpose, we evaluated three methods, namely, Natural Language Inference, Next Sentence Prediction and our proposed model that is based on Masked Language Modeling and pre-trained word embeddings on nine Turkish datasets for three main categories: topic, sentiment, and emotion. We used pre-trained Turkish monolingual and multilingual transformer models which can be listed as BERT, ConvBERT, DistilBERT and mBERT. The results showed that ConvBERT with the NLI method yields the best results with 79% and outperforms previously used multilingual XLM-RoBERTa model by 19.6%. The study contributes to the literature using different and unattempted transformer models for Turkish and showing improvement of zero-shot text classification performance for monolingual models over multilingual models.
Açıklama
Anahtar Kelimeler
Text Classification, Zero-Shot Learning, Next Sentence Prediction, Natural Language İnference, Masked Language Modeling, Dataset
Kaynak
Information Processing & Management
WoS Q Değeri
N/A
Scopus Q Değeri
Q1
Cilt
60
Sayı
3