Unified benchmark for zero-shot Turkish text classification

celik, Emrecan; Dalyan, Tugba

Unified benchmark for zero-shot Turkish text classification

Tarih

2023

Yazarlar

celik, Emrecan

Dalyan, Tugba

Yayıncı

Elsevier Sci Ltd

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

Effective learning schemes such as fine-tuning, zero-shot, and few-shot learning, have been widely used to obtain considerable performance with only a handful of annotated training data. In this paper, we presented a unified benchmark to facilitate the problem of zeroshot text classification in Turkish. For this purpose, we evaluated three methods, namely, Natural Language Inference, Next Sentence Prediction and our proposed model that is based on Masked Language Modeling and pre-trained word embeddings on nine Turkish datasets for three main categories: topic, sentiment, and emotion. We used pre-trained Turkish monolingual and multilingual transformer models which can be listed as BERT, ConvBERT, DistilBERT and mBERT. The results showed that ConvBERT with the NLI method yields the best results with 79% and outperforms previously used multilingual XLM-RoBERTa model by 19.6%. The study contributes to the literature using different and unattempted transformer models for Turkish and showing improvement of zero-shot text classification performance for monolingual models over multilingual models.

Anahtar Kelimeler

Text Classification, Zero-Shot Learning, Next Sentence Prediction, Natural Language İnference, Masked Language Modeling, Dataset

Kaynak

Information Processing & Management

WoS Q Değeri

N/A

Scopus Q Değeri

Q1

Cilt

60

Sayı

3

Bağlantı

https://doi.org/10.1016/j.ipm.2023.103298
https://hdl.handle.net/11411/7400

Koleksiyon

Web of Science Indexed Publications
Scopus Indexed Publications

Detaylı Öğe Kaydı

Unified benchmark for zero-shot Turkish text classification

Tarih

Yazarlar

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

Özet

Açıklama

Anahtar Kelimeler

Kaynak

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye

Bağlantı

Koleksiyon