Predicting credit default risk using machine learning algorithm

Telimen, Mehmet

Predicting credit default risk using machine learning algorithm

Files

Predicting credit default risk using machine learning algorithm.pdf (3.38 MB)

Date

2019

Authors

Telimen, Mehmet

Publisher

İstanbul Bilgi Üniversitesi

Access Rights

info:eu-repo/semantics/openAccess

Abstract

In this study, it was aimed to construct the analytical models that predict the probability of default of consumer credit by using machine learning algorithms. The data belonging to the customers of a bank has been used by making anonymity from the bank's test environment. This data set was composed of the lending status of the customers in the bank and the questioned credit bureau data at the credit application stage. Half of the samples in the data set were selected from those who had been in default and half were not. In the study, four of the widely used techniques of classifıcation based on machine learning have been discussed. Those are Logistic Regression, Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), and KNearest Neighbors (KNN). For each model, half of the data set was used for training and the other half was used for testing. Those models, which were trained with the same training set using the corresponding functions in R studio with R programming language, were tested with the same data set and the accuracy rates of them were compared. As a result of the comparison, with given this data, it is observed that the model of the Logistic Regression estimated the probability of default of the consumer loan with the highest accuracy rate which was 58.30%.

Bu çalışmada, makine öğrenmesi algoritmaları kullanılarak, tüketici kredisinin temerrüte düşme ihtimalini tahminleyen analitik modellerin oluşturulması amaçlanmıştır. Çalışmada, bir bankanın müşterilerine ait veriler, bankanın test ortamından anonim hale getirilerek kullanılmıştır. Söz konusu veri seti müşterinin bankadaki kredisinin gecikme durumu ve kredi başvuru aşamasındaki sorgulanmış Kredi Kayıt Bürosu (KKB) verilerinden oluşturulmuştur. Veri kümesindeki örneklemlerin yarısı gecikmişe düşmüş, yarısı da gecikmişe düşmemiş kredi kayıtlarından oluşacak şekilde hazırlanmıştır. Çalışmada, gözetimli makine öğrenmesine dayalı sınıflandırma tekniklerinden yaygın olarak kullanılan dört tanesi ele alınmıştır. Bunlar, Logistik Regression, Linear Discriminant Analizi, Quadratik Discriminant Analizi ve K-En Yakın Komşuluk Metodudur. Her bir model için veri setinin yarısı eğitim için kullanılırken diğer yarısı da modelin test edilmesi için kullanılmıştır. R studioda R programlama dilindeki ilgili fonkisyonlar kullanılarak aynı eğitim seti ile eğitilen bu modeller yine aynı test seti ile test edilip tahmin oranları kıyaslanmıştır. Kıyaslama sonucu, bahsedilen ven kümesi üzerinden hesaplandığında, tüketici kredisinin temerrüte düşme ihtimalini en yüksek doğruluk oranı (%58.30) ile lojistik regresyona ait modelin tahmin ettiği gözlemlenmiştir.

URI

https://hdl.handle.net//11411/1769
https://tez.yok.gov.tr/UlusalTezMerkezi/TezGoster?key=FgmkGchPKo23qQqBeqzVZgc-BSXIPLVdz0sUzoEfkpy7SE9i1Jz5spI4tc624H6z

Collections

Graduate Programs Institute Thesis Collection

Rights and licensing

info:eu-repo/semantics/openAccess

Full item page

Predicting credit default risk using machine learning algorithm

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Access Rights

DOI

Abstract

Description

Keywords

Journal or Series

WoS Q Value

Scopus Q Value

Volume

Issue

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

Rights and licensing