A comparison of different approaches to document representation in Turkish language

Yıldız, Tuğba; Yıldırım, Savaş

A comparison of different approaches to document representation in Turkish language

dc.contributor.author	Yıldız, Tuğba
dc.contributor.author	Yıldırım, Savaş
dc.date.accessioned	2021-12-28T14:23:51Z
dc.date.available	2021-12-28T14:23:51Z
dc.date.issued	2018
dc.description.abstract	ABSTRACT: Recently, deep learning methods have demonstrated state-of-the-art performance in numerous complex Natural Language Processing (NLP) problems. Easy accessibility of high-performance computing resources and open-source libraries makes Artificial Intelligence (AI) approaches more applicable for researchers. This sudden growth of available techniques shaped and improved standards in the field of NLP. Thus, we find an opportunity to compare different approaches to document representation, owing to various open-source libraries and a large amount of research. We evaluate four different paradigms to represent documents: Traditional bag-of-words approaches, topic modeling, embedding based approach and deep learning. As the main contribution of this article, we aim at evaluating all these representation approaches with suitable machine learning algorithms for document categorization problem in the Turkish language. The supervised architecture uses a benchmark dataset specifically prepared for this language. Within the architecture, we evaluate the representation approaches with corresponding machine learning algorithms such as Support Vector Machine (SVM), multi-nominal Naive Bayes Algorithm (m-NB) and so forth. We conduct a variety of experiments and present successful results for the Turkish document categorization. We also observed that tradition approaches have still comparable results with Neural Network models in terms of document classification.	en_US
dc.description.abstract	ÖZET: Son zamanlarda derin ögrenme mimarileri bir çok do ? gal dil i¸sleme problemini ? ba¸sarılı bir ¸sekilde çözmü¸stür. Açık kaynak kodlu kütüphanelerin yaygınlıgı yapay ? zeka yakla¸sımlarını daha uygulanabilir hale getirmi¸stir. Teknolojideki bu ani ivmelenme dogal dil i¸slemedeki standartları dönü¸stürdü ve geli¸stirdi. Bu çalı¸smada açık kaynak ? kodların ve alanla ilgili ara¸stırmaların rahat eri¸sebilirligi sayesinde metin temsiliyeti ? yakla¸sımlarının önemli bir kısmını degerlendirme imkanı bulduk. Dört farklı paradigmayı ? metin temsiliyeti açısından degerlendirdik: Geleneksel kelime torbası yakla¸sımı, konu ? modelleme, gömme temsiliyeti ve derin ögrenme. Çalı¸smanın ana katkısı olarak, Türkçe ? için metin sınıflandırma problemini tüm bu metin temsiliyetlerini ve ilgili makine ögrenme ? algoritmalarını kullanarak ele aldık. Olu¸sturulan denetimli ögrenme mimarisi özellikle ? Türkçe için hazırlanmı¸s bir veri seti ile sınanmı¸stır. Her bir temsiliyet için onunla uyumlu çalı¸sacak SVM, çok-katlı Naive Bayes (mNB) gibi makine ögrenmesi algoritmaları sınandı. ? Çe¸sitli deneyler sonucunda ba¸sarılı bir metin sınıflandırıcı mimarisinin Türkçe için nasıl kurulacagını bu makalede tartı¸stık ve ba¸sarılı modeller sunduk. Son olarak kelime torbası ? gibi geleneksel yöntemlerin hala ba¸sarılı oldugunu ve derin ö ? grenme temelli modellerin ? bazılarından daha iyi oldugunu gördük.	en_US
dc.fullTextLevel	Full Text	en_US
dc.identifier.doi	10.19113/sdufbed.15893
dc.identifier.issn	1300-7688
dc.identifier.uri	https://hdl.handle.net/11411/4273
dc.identifier.uri	https://doi.org/10.19113/sdufbed.15893
dc.indekslendigikaynak	TR-Dizin	en_US
dc.issue	2	en_US
dc.language.iso	en	en_US
dc.national	International	en_US
dc.numberofauthors	2	en_US
dc.pages	569-576	en_US
dc.publisher	Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi	en_US
dc.relation.ispartof	Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi	en_US
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Document representation	en_US
dc.subject	Deep learning	en_US
dc.subject	Natural language processing	en_US
dc.subject	Metin temsiliyeti	en_US
dc.subject	Derin ögrenme	en_US
dc.subject	Doğal dil işleme	en_US
dc.title	A comparison of different approaches to document representation in Turkish language
dc.title.alternative	Metin temsil yöntemlerine yönelik farklı yaklaşımların karşılaştırılması
dc.type	Article
dc.volume	22	en_US

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1

İsim:: 2018YıldırımYıldız.pdf
Boyut:: 133.55 KB
Biçim:: Adobe Portable Document Format
Açıklama:

İndir

Lisans paketi

Listeleniyor 1 - 1 / 1

İsim:: license.txt
Boyut:: 1.71 KB
Biçim:: Item-specific license agreed upon to submission
Açıklama:

İndir

Koleksiyon

Faculty of Engineering and Natural Sciences
TR Dizin Indexed Publications