A Comparison of Different Approaches to Document Representation in Turkish LanguageA Comparison of Different Approaches to Document Representation in Turkish Language
Küçük Resim Yok
Tarih
2018
Yazarlar
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Erişim Hakkı
info:eu-repo/semantics/openAccess
Özet
Recently, deep learning methods have demonstrated state-of-the-art performancein numerous complex Natural Language Processing (NLP) problems. Easy accessibilityof high-performance computing resources and open-source libraries makes ArtificialIntelligence (AI) approaches more applicable for researchers. This sudden growth ofavailable techniques shaped and improved standards in the field of NLP. Thus, we find anopportunity to compare different approaches to document representation, owing to variousopen-source libraries and a large amount of research. We evaluate four different paradigmsto represent documents: Traditional bag-of-words approaches, topic modeling, embeddingbased approach and deep learning. As the main contribution of this article, we aim atevaluating all these representation approaches with suitable machine learning algorithmsfor document categorization problem in the Turkish language. The supervised architectureuses a benchmark dataset specifically prepared for this language. Within the architecture,we evaluate the representation approaches with corresponding machine learning algorithmssuch as Support Vector Machine (SVM), multi-nominal Naive Bayes Algorithm(m-NB) and so forth. We conduct a variety of experiments and present successful resultsfor the Turkish document categorization. We also observed that tradition approaches havestill comparable results with Neural Network models in terms of document classification.
Açıklama
Anahtar Kelimeler
Kaynak
Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi
WoS Q Değeri
Scopus Q Değeri
Cilt
22
Sayı
2