TR-MMLU Benchmark for Large Language Models: Performance Evaluation, Challenges, and Opportunities for Improvement

Bayram, M. Ali; Fincan, Ali Arda; Gumus, Ahmet Semih; Diri, Banu; Yildirim, Savas; Aytas, Oner

TR-MMLU Benchmark for Large Language Models: Performance Evaluation, Challenges, and Opportunities for Improvement

Tarih

2025

Yazarlar

Yayıncı

Ieee

Erişim Hakkı

info:eu-repo/semantics/openAccess

Özet

Language models have made significant advancements in understanding and generating human language, achieving remarkable success in various applications. However, evaluating these models remains a challenge, particularly for resource-limited languages like Turkish. To address this issue, we introduce the Turkish MMLU (TR-MMLU) benchmark, a comprehensive evaluation framework designed to assess the linguistic and conceptual capabilities of large language models (LLMs) in Turkish. TR-MMLU is based on a meticulously curated dataset comprising 6,200 multiple-choice questions across 62 sections within the Turkish education system. This benchmark provides a standard framework for Turkish NLP research, enabling detailed analyses of LLMs' capabilities in processing Turkish text. In this study, we evaluated state-of-the-art LLMs on TR-MMLU, highlighting areas for improvement in model design. TR-MMLU sets a new standard for advancing Turkish NLP research and inspiring future innovations.

Açıklama

33rd Conference on Signal Processing and Communications Applications-SIU-Annual -- JUN 25-28, 2025 -- Istanbul, TURKIYE

Anahtar Kelimeler

Large Language Models (Llm), Natural Language Processing (Nlp), Artificial Intelligence, Turkish Nlp

Kaynak

2025 33Rd Signal Processing and Communications Applications Conference, Siu

WoS Q Değeri

N/A

Scopus Q Değeri

N/A

Bağlantı

https://doi.org/10.1109/SIU66497.2025.11112154
https://hdl.handle.net/11411/10589

Koleksiyon

Web of Science Indexed Publications
Scopus Indexed Publications

Detaylı Öğe Kaydı

TR-MMLU Benchmark for Large Language Models: Performance Evaluation, Challenges, and Opportunities for Improvement

Tarih

Yazarlar

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

Özet

Açıklama

Anahtar Kelimeler

Kaynak

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye

Bağlantı

Koleksiyon