TR-MMLU Benchmark for Large Language Models: Performance Evaluation, Challenges, and Opportunities for Improvement
| dc.contributor.author | Bayram, M. Ali | |
| dc.contributor.author | Fincan, Ali Arda | |
| dc.contributor.author | Gumus, Ahmet Semih | |
| dc.contributor.author | Diri, Banu | |
| dc.contributor.author | Yildirim, Savas | |
| dc.contributor.author | Aytas, Oner | |
| dc.date.accessioned | 2026-04-04T18:55:51Z | |
| dc.date.available | 2026-04-04T18:55:51Z | |
| dc.date.issued | 2025 | |
| dc.department | İstanbul Bilgi Üniversitesi | |
| dc.description | 33rd Conference on Signal Processing and Communications Applications-SIU-Annual -- JUN 25-28, 2025 -- Istanbul, TURKIYE | |
| dc.description.abstract | Language models have made significant advancements in understanding and generating human language, achieving remarkable success in various applications. However, evaluating these models remains a challenge, particularly for resource-limited languages like Turkish. To address this issue, we introduce the Turkish MMLU (TR-MMLU) benchmark, a comprehensive evaluation framework designed to assess the linguistic and conceptual capabilities of large language models (LLMs) in Turkish. TR-MMLU is based on a meticulously curated dataset comprising 6,200 multiple-choice questions across 62 sections within the Turkish education system. This benchmark provides a standard framework for Turkish NLP research, enabling detailed analyses of LLMs' capabilities in processing Turkish text. In this study, we evaluated state-of-the-art LLMs on TR-MMLU, highlighting areas for improvement in model design. TR-MMLU sets a new standard for advancing Turkish NLP research and inspiring future innovations. | |
| dc.description.sponsorship | Institute of Electrical and Electronics Engineers Inc | |
| dc.identifier.doi | 10.1109/SIU66497.2025.11112154 | |
| dc.identifier.doi | 10.1109/SIU66497.2025.11112154 | |
| dc.identifier.isbn | 979-8-3315-6656-2 | |
| dc.identifier.isbn | 979-8-3315-6655-5 | |
| dc.identifier.issn | 2165-0608 | |
| dc.identifier.scopus | 2-s2.0-105015564217 | |
| dc.identifier.scopusquality | N/A | |
| dc.identifier.uri | https://doi.org/10.1109/SIU66497.2025.11112154 | |
| dc.identifier.uri | https://hdl.handle.net/11411/10589 | |
| dc.identifier.wos | WOS:001575462500215 | |
| dc.identifier.wosquality | N/A | |
| dc.indekslendigikaynak | Web of Science | |
| dc.indekslendigikaynak | Scopus | |
| dc.language.iso | tr | |
| dc.publisher | Ieee | |
| dc.relation.ispartof | 2025 33Rd Signal Processing and Communications Applications Conference, Siu | |
| dc.relation.publicationcategory | Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı | |
| dc.rights | info:eu-repo/semantics/openAccess | |
| dc.snmz | KA_WoS_20260402 | |
| dc.snmz | KA_Scopus_20260402 | |
| dc.subject | Large Language Models (Llm) | |
| dc.subject | Natural Language Processing (Nlp) | |
| dc.subject | Artificial Intelligence | |
| dc.subject | Turkish Nlp | |
| dc.title | TR-MMLU Benchmark for Large Language Models: Performance Evaluation, Challenges, and Opportunities for Improvement | |
| dc.type | Conference Object |











