Unified benchmark for zero-shot Turkish text classification

dc.contributor.authorcelik, Emrecan
dc.contributor.authorDalyan, Tugba
dc.date.accessioned2024-07-18T20:42:44Z
dc.date.available2024-07-18T20:42:44Z
dc.date.issued2023
dc.departmentİstanbul Bilgi Üniversitesien_US
dc.description.abstractEffective learning schemes such as fine-tuning, zero-shot, and few-shot learning, have been widely used to obtain considerable performance with only a handful of annotated training data. In this paper, we presented a unified benchmark to facilitate the problem of zeroshot text classification in Turkish. For this purpose, we evaluated three methods, namely, Natural Language Inference, Next Sentence Prediction and our proposed model that is based on Masked Language Modeling and pre-trained word embeddings on nine Turkish datasets for three main categories: topic, sentiment, and emotion. We used pre-trained Turkish monolingual and multilingual transformer models which can be listed as BERT, ConvBERT, DistilBERT and mBERT. The results showed that ConvBERT with the NLI method yields the best results with 79% and outperforms previously used multilingual XLM-RoBERTa model by 19.6%. The study contributes to the literature using different and unattempted transformer models for Turkish and showing improvement of zero-shot text classification performance for monolingual models over multilingual models.en_US
dc.identifier.doi10.1016/j.ipm.2023.103298
dc.identifier.issn0306-4573
dc.identifier.issn1873-5371
dc.identifier.issue3en_US
dc.identifier.scopus2-s2.0-85147606807en_US
dc.identifier.scopusqualityQ1en_US
dc.identifier.urihttps://doi.org/10.1016/j.ipm.2023.103298
dc.identifier.urihttps://hdl.handle.net/11411/7400
dc.identifier.volume60en_US
dc.identifier.wosWOS:000991728500001en_US
dc.identifier.wosqualityN/Aen_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakScopusen_US
dc.language.isoenen_US
dc.publisherElsevier Sci Ltden_US
dc.relation.ispartofInformation Processing & Managementen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectText Classificationen_US
dc.subjectZero-Shot Learningen_US
dc.subjectNext Sentence Predictionen_US
dc.subjectNatural Language İnferenceen_US
dc.subjectMasked Language Modelingen_US
dc.subjectDataseten_US
dc.titleUnified benchmark for zero-shot Turkish text classification
dc.typeArticle

Dosyalar