Building Domain-Specific Lexicons: An Application to Financial News
Tarih
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Erişim Hakkı
Özet
Natural Language Processing (NLP) has gained attention in the recent years. Previous research (such as WordNet and Cyc) has focused on developing an all purpose (generalised) polarised lexicons. However, these lexicons do not provide much information in different domains such as Finance and Medical Sciences. Using these lexicons for text classification could affect the prediction accuracy. Therefore, there is a need for building domain- and context-specific lexicons. To achieve this, in this work, a label based propagation based word embedding algorithm has been proposed to obtain positive and negative lexicons. The proposed algorithm works on the principle of graph theory and word embedding. The proposed algorithm is tested on Dow Jones news wires text feed to classify the Financial news as hot and non-hot. Three classifiers, namely, Logistic Regression, Random Forest and XGBoost, employing polarised lexicons, seed words and random words were used. The performance of classifiers in all cases was evaluated using accuracy. Lexicons generated using the proposed approach were effective in classifying the Financial news articles as hot and non-hot compared to classifiers using seed words and random words. Proposed label propagation with word embedding algorithm generates context-specific lexicons, which aids in helps in better representation of text in natural processing tasks and avoids the problem of dimensionality. © 2019 IEEE.