Türkçe için karşılaştırmalı metin sınıflandırma analizi

Yıldız, Tuğba; Yıldırım, Savaş

doi:10.5505/pajes.2018.15931

Türkçe için karşılaştırmalı metin sınıflandırma analizi

dc.authorid	0000-0002-5868-5407	en_US
dc.authorid	0000-0002-7764-2891	en_US
dc.contributor.author	Yıldız, Tuğba
dc.contributor.author	Yıldırım, Savaş
dc.date.accessioned	2021-12-28T15:38:35Z
dc.date.available	2021-12-28T15:38:35Z
dc.date.issued	2018
dc.description.abstract	ÖZET: Metin Sınıflandırma Doğal Dil İşleme (DDİ) alanında önemli bir yere sahiptir. Son zamanlarda metinsel verilerin artması ve otomatik etiketlenmesi gerekliliği, metin sınıflandırma probleminin önemini artırmıştır. Geleneksel yaklaşımlardan öne çıkan kelime torbası yöntemi yıllardır metin sınıflandırmasında başarılı olmaktadır. Son zamanlarda sinir ağları dil modelleri DDİ problemlerine başarılı bir şekilde uygulanmış ve bazı alanlarda büyük başarı kaydetmişlerdir. Yapay Sinir Ağları (YSA) temelli mimarilerin en önemli avantajı daha etkili kelime ve metin gösterilimlerin oluşturmasıdır. Bu gösterilimler, geleneksel yöntemlere göre daha az boyutlu ve daha etkili bulunmuştur. Özellikle anlambilimsel ve sözdizimsel analizlerde başarılı uygulamalar yapılmıştır. Öte yandan daha uzun vektörlerle gösterilim kullanan geleneksel kelime torbası yöntemleri, metin gösterilimleri anlamında hala gücünü korumaktadır. Ancak Türkçe için bu iki yaklaşımın herhangi bir karşılaştırılması yapılmamıştır. Bu çalışmada, geleneksel kelime torbası yaklaşımı ile sinir ağı temelli yeni gösterilim yaklaşımları metin sınıflandırması açısından karşılaştırılmıştır. Bu çalışmalarda gördük ki etkili özellik seçimleri geleneksel yöntemlerinin hala yeni kuşak kelime gömme (word embeddings) yaklaşımı ile yarışacak düzeydedir. Son olarak deneylerimizi bu iki yaklaşım açısından çeşitlendirerek raporladık ve Türkçe için başarılı metin sınıflandırma mimarisini bu raporda ayrıntılı tartıştık.	en_US
dc.description.abstract	ABSTRACT: Text categorization plays important role in the field of Natural Language Processing. Recently, the rapid growth in the amount of textual data and requirement of automatic annotation makes the problem of text categorization more important. As a prominent one of the traditional methods, the bag-of-words approach has been successfully applied to text categorization problem for years. Recently, Neural Network Language Models (NNLM) have achieved successful results for various problems of Natural Language Processing (NLP). The most important advantage of the NNLM is to provide effective word and document representations. Those representations are lower dimensional and are found to be more effective than traditional methods. They have been exploited successfully for semantic and syntactic analysis. On the other hand, the traditional bag-of-words approaches that use one-hot long vector representation are still considered powerful in terms of their accuracy in document classification. However, comparing these approaches for Turkish language has not been attempted before. In this study, we compared them within a variety of analysis. We observed that the traditional bagof-word representation utilizing an effective feature selection and a machine learning algorithm aligned with it have comparable performance with new generation vector based methods, namely word embeddings. In this study, we have conducted various experiments comparing these approaches and designated an effective text categorization architecture for Turkish Language	en_US
dc.fullTextLevel	Full Text	en_US
dc.identifier.doi	10.5505/pajes.2018.15931
dc.identifier.issn	2147-5881
dc.identifier.trdizinid	306827	en_US
dc.identifier.uri	https://hdl.handle.net/11411/4281
dc.identifier.uri	https://doi.org/10.5505/pajes.2018.15931
dc.identifier.uri	https://search.trdizin.gov.tr/yayin/detay/306827	en_US
dc.identifier.wos	WOS:000446742400012	en_US
dc.identifier.wosquality	N/A	en_US
dc.indekslendigikaynak	Web of Science	en_US
dc.indekslendigikaynak	TR-Dizin	en_US
dc.issue	5	en_US
dc.language.iso	tr	en_US
dc.national	National	en_US
dc.numberofauthors	2	en_US
dc.pages	879-886	en_US
dc.publisher	Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi	en_US
dc.relation.ispartof	Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi	en_US
dc.relation.publicationcategory	Makale - Ulusal Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.snmz	20240718_Mükerrer
dc.subject	Metin sınıflandırma	en_US
dc.subject	Makine öğrenmesi	en_US
dc.subject	Yapay sinir ağları	en_US
dc.subject	Text classification	en_US
dc.subject	Machine learning	en_US
dc.subject	Artificial neural network	en_US
dc.title	Türkçe için karşılaştırmalı metin sınıflandırma analizi
dc.title.alternative	A comparative analysis of text classification for Turkish language
dc.type	Article
dc.volume	24	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 2018YıldırımYıldız.pdf
Size:: 596.83 KB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Faculty of Engineering and Natural Sciences
TR Dizin Indexed Publications
Web of Science Indexed Publications