A Comparison of Different Approaches to Document Representation in Turkish LanguageA Comparison of Different Approaches to Document Representation in Turkish Language

Yıldırım, Savaş; Yıldız, Tuğba

A Comparison of Different Approaches to Document Representation in Turkish LanguageA Comparison of Different Approaches to Document Representation in Turkish Language

Date

2018

Authors

Yıldırım, Savaş

Yıldız, Tuğba

Access Rights

info:eu-repo/semantics/openAccess

Abstract

Recently, deep learning methods have demonstrated state-of-the-art performancein numerous complex Natural Language Processing (NLP) problems. Easy accessibilityof high-performance computing resources and open-source libraries makes ArtificialIntelligence (AI) approaches more applicable for researchers. This sudden growth ofavailable techniques shaped and improved standards in the field of NLP. Thus, we find anopportunity to compare different approaches to document representation, owing to variousopen-source libraries and a large amount of research. We evaluate four different paradigmsto represent documents: Traditional bag-of-words approaches, topic modeling, embeddingbased approach and deep learning. As the main contribution of this article, we aim atevaluating all these representation approaches with suitable machine learning algorithmsfor document categorization problem in the Turkish language. The supervised architectureuses a benchmark dataset specifically prepared for this language. Within the architecture,we evaluate the representation approaches with corresponding machine learning algorithmssuch as Support Vector Machine (SVM), multi-nominal Naive Bayes Algorithm(m-NB) and so forth. We conduct a variety of experiments and present successful resultsfor the Turkish document categorization. We also observed that tradition approaches havestill comparable results with Neural Network models in terms of document classification.

Journal or Series

Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi

Volume

22

Issue

2

URI

https://search.trdizin.gov.tr/yayin/detay/323160
https://hdl.handle.net/11411/5892

Collections

TR Dizin Indexed Publications

Rights and licensing

info:eu-repo/semantics/openAccess

Full item page

A Comparison of Different Approaches to Document Representation in Turkish LanguageA Comparison of Different Approaches to Document Representation in Turkish Language

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Access Rights

DOI

Abstract

Description

Keywords

Journal or Series

WoS Q Value

Scopus Q Value

Volume

Issue

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

Rights and licensing