A Comprehensive Study of Learning Approaches for Author Gender Identification

dc.contributor.authorDalyan, Tuğba
dc.date.accessioned2022-10-12T08:30:32Z
dc.date.available2022-10-12T08:30:32Z
dc.date.issued2022-09-23
dc.description.abstractAbstract: In recent years, author gender identification is an important yet challenging task in the fields of information retrieval and computational linguistics. In this paper, different learning approaches are presented to address the problem of author gender identification for Turkish articles. First, several classification algorithms are applied to the list of representations based on different paradigms: fixed-length vector representations such as Stylometric Features (SF), Bag-of-Words (BoW) and distributed word/document embeddings such as Word-2vec, fastText and Doc2vec. Secondly, deep learning architectures, Convolution Neural Network (CNN), Recurrent Neural Network (RNN), special kinds of RNN such as Long-Short Term Memory (LSTM) and Gated Recurrent Unit (GRU), C-RNN, Bidirectional LSTM (bi-LSTM), Bidirectional GRU (bi-GRU), Hierarchical Attention Networks and Multi-head Attention (MHA) are designated and their comparable performances are evaluated. We conducted a variety of experiments and achieved outstanding empirical results. To conclude, ML algorithms with BoW have promising results. fast-Text is also probably suitable between embedding models. This comprehensive study contributes to literature utilizing different learning approaches based on several ways of representations. It is also first attempt to identify author gender applying SF on Turkish language. © 2022, Kauno Technologijos Universitetas. All rights reserved.en_US
dc.fullTextLevelFull Texten_US
dc.identifier.doi10.5755/j01.itc.51.3.29907en_US
dc.identifier.issn1392124X
dc.identifier.scopus2-s2.0-85138955207en_US
dc.identifier.urihttps://hdl.handle.net/11411/4565
dc.identifier.urihttps://doi.org/10.5755/j01.itc.51.3.29907
dc.identifier.wosWOS:000871757400002en_US
dc.identifier.wosqualityN/Aen_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakScopusen_US
dc.issue3en_US
dc.language.isoenen_US
dc.nationalInternationalen_US
dc.numberofauthors3en_US
dc.pages429 - 445en_US
dc.publisherKauno Technologijos Universitetasen_US
dc.relation.ispartofInformation Technology and Controlen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectAuthor gender identificationen_US
dc.subjectDeep learningen_US
dc.subjectEmbeddingsen_US
dc.subjectStylometric featuresen_US
dc.titleA Comprehensive Study of Learning Approaches for Author Gender Identificationen_US
dc.typeArticleen_US
dc.volume51en_US

Dosyalar

Orijinal paket
Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
Dalyan2022.pdf
Boyut:
396.7 KB
Biçim:
Adobe Portable Document Format
Açıklama:
Lisans paketi
Listeleniyor 1 - 1 / 1
Küçük Resim Yok
İsim:
license.txt
Boyut:
1.71 KB
Biçim:
Item-specific license agreed upon to submission
Açıklama: