Arşiv logosu
  • Türkçe
  • English
  • Giriş
    Yeni kullanıcı mısınız? Kayıt için tıklayın. Şifrenizi mi unuttunuz?
Arşiv logosu
  • Koleksiyonlar
  • Sistem İçeriği
  • Analiz
  • Hakkında
  • Türkçe
  • English
  • Giriş
    Yeni kullanıcı mısınız? Kayıt için tıklayın. Şifrenizi mi unuttunuz?
  1. Ana Sayfa
  2. Yazara Göre Listele

Yazar "Gumus, Ahmet Semih" seçeneğine göre listele

Listeleniyor 1 - 2 / 2
Sayfa Başına Sonuç
Sıralama seçenekleri
  • Küçük Resim Yok
    Öğe
    Tokenization Standards and Evaluation in Natural Language Processing: A Comparative Analysis of Large Language Models on Turkish
    (Ieee, 2025) Bayram, M. Ali; Fincan, Ali Arda; Gumus, Ahmet Semih; Karakas, Sercan; Diri, Banu; Yildirim, Savas
    Tokenization is a fundamental preprocessing step in Natural Language Processing (NLP), significantly impacting the capability of large language models (LLMs) to capture linguistic and semantic nuances. This study introduces a novel evaluation framework addressing tokenization challenges specific to morphologically-rich and low-resource languages such as Turkish. Utilizing the Turkish MMLU (TR-MMLU) dataset, comprising 6,200 multiple-choice questions from the Turkish education system, we assessed tokenizers based on vocabulary size, token count, processing time, language-specific token percentages (%TR), and token purity (%Pure). These newly proposed metrics measure how effectively tokenizers preserve linguistic structures. Our analysis reveals that language-specific token percentages exhibit a stronger correlation with downstream performance (e.g., MMLU scores) than token purity. Furthermore, increasing model parameters alone does not necessarily enhance linguistic performance, underscoring the importance of tailored, language-specific tokenization methods. The proposed framework establishes robust and practical tokenization standards for morphologically complex languages.
  • Küçük Resim Yok
    Öğe
    TR-MMLU Benchmark for Large Language Models: Performance Evaluation, Challenges, and Opportunities for Improvement
    (Ieee, 2025) Bayram, M. Ali; Fincan, Ali Arda; Gumus, Ahmet Semih; Diri, Banu; Yildirim, Savas; Aytas, Oner
    Language models have made significant advancements in understanding and generating human language, achieving remarkable success in various applications. However, evaluating these models remains a challenge, particularly for resource-limited languages like Turkish. To address this issue, we introduce the Turkish MMLU (TR-MMLU) benchmark, a comprehensive evaluation framework designed to assess the linguistic and conceptual capabilities of large language models (LLMs) in Turkish. TR-MMLU is based on a meticulously curated dataset comprising 6,200 multiple-choice questions across 62 sections within the Turkish education system. This benchmark provides a standard framework for Turkish NLP research, enabling detailed analyses of LLMs' capabilities in processing Turkish text. In this study, we evaluated state-of-the-art LLMs on TR-MMLU, highlighting areas for improvement in model design. TR-MMLU sets a new standard for advancing Turkish NLP research and inspiring future innovations.

| İstanbul Bilgi Üniversitesi | Kütüphane | Rehber | OAI-PMH |

Bu site Creative Commons Alıntı-Gayri Ticari-Türetilemez 4.0 Uluslararası Lisansı ile korunmaktadır.


Eski Silahtarağa Elektrik Santralı, Eyüpsultan, İstanbul, TÜRKİYE
İçerikte herhangi bir hata görürseniz lütfen bize bildirin

DSpace 7.6.1, Powered by İdeal DSpace

DSpace yazılımı telif hakkı © 2002-2026 LYRASIS

  • Çerez Ayarları
  • Hakkında
  • Son Kullanıcı Sözleşmesi
  • Geri Bildirim