Performance Analysis of Credit Scoring Models on Lending Club Data
Název práce v češtině: | Performance Analysis of Credit Scoring Models on Lending Club Data |
---|---|
Název v anglickém jazyce: | Performance Analysis of Credit Scoring Models on Lending Club Data |
Klíčová slova: | Kreditní skórování, P2P půjčování, Klasifikace, Žebříček klasifikátorů |
Klíčová slova anglicky: | Credit scoring, P2P Lending, Classification, Classifiers’ ranking |
Akademický rok vypsání: | 2016/2017 |
Typ práce: | diplomová práce |
Jazyk práce: | angličtina |
Ústav: | Institut ekonomických studií (23-IES) |
Vedoucí / školitel: | prof. PhDr. Petr Teplý, Ph.D. |
Řešitel: | skrytý![]() |
Datum přihlášení: | 09.11.2016 |
Datum zadání: | 09.11.2016 |
Datum a čas obhajoby: | 22.06.2017 08:30 |
Místo konání obhajoby: | Opletalova - Opletalova 26, O105, Opletalova - místn. č. 105 |
Datum odevzdání elektronické podoby: | 18.05.2017 |
Datum proběhlé obhajoby: | 22.06.2017 |
Oponenti: | Mgr. Magda Pečená, Ph.D. |
Kontrola URKUND: | ![]() |
Zásady pro vypracování |
MP2P lending platforms, a new financial intermediary between borrowers and lenders, experience an astonishing growth since their inception. For example, the biggest P2P lending platform in USA, Lending Club, almost double the amount of issued loans each year . P2P lending is growing in Europe as well. Wardrop et al. (2015) has showed that P2P consumer lending more than doubled the amount of lend money each year since 2012.
Our master thesis will be based on the data provided by Lending Club. Lending Club publishes information about all issued loans on their websites. For purpose of our thesis, we have taken a data set of loans issued between January 2009 and December 2012. Our data set contains 85 699 loans and we know the final status of all loans. We can extract training and testing data sample from our data set. Moreover, our data set is large enough to ensure inter-temporal validation. It is essential for P2P lending platforms to decrease the information asymmetry between lenders and borrowers. Therefore, the borrowers are required to provide some information about themselves and the loan characteristics. Based on this information, P2P lending platforms use their credit scoring models to properly assess borrowers’ credit risks. Well performing credit scoring model is pivotal for P2P lending platforms’ success. Nevertheless, as researched by Abdou & Pointon (2011)’s meta-analyses including more than 200 articles about credit scoring models, there is no single credit scoring method outperforming others. Based on the Kaggle competition dataset, Random forest was, however, chosen as the best credit scoring method by Pandey (2011). His results are in line with Liang (2011). The real world datasets, such as Lending Club dataset, usually behaves differently. Tsai (2014) have found out that Random forest model and Support Vector Machines are outperformed by Logistic regression with penalties for negative classes based on the Lending Club data. In addition to that, Chang et al. (2016) showed that Naïve Bayes with Gaussian outperforms Logistic regression as well as SVM. The question what is the best credit scoring model for Lending Club data has not yet been solved. Moreover, credit scoring models, such as Neural networks or C5.0, have not been compared in neither of these studies. The purpose of our master thesis is comprehensive performance comparison of various credit scoring models. Furthermore, we want to develop our own credit scoring model. Our credit scoring model will be based on Logistic regression putting more weight on determinants of borrowers’ default researched by Carmichael (2014) and Serrano-Cinca et al. (2015). |
Seznam odborné literatury |
Carmichael, D. (2014): Modeling default for peer-to-peer loans. Available at SSRN: http://ssrn.com/abstract=2529240 pp. 1-43.
Liang, J. (2010): Predicting borrowers’ chance of default- ing on credit loans. pp 1-5 Mills, K. G. (2014): The State of Small Business Lending : Credit Access during the Recovery and How Technology May Change the Game THE STATE OF SMALL BUSINESS LENDING. Namvar, E. (2013): An Introduction to Peer to Peer Loans as Investments. pp. 1-18. Pandey, J. N. (2011) :Predicting Probability of Loan Default Stanford University, CS229 Project report Jiten- dra Nath Pandey, Maheshwaran Srinivasan. Serrano-Cinca, C., B. Guti_errez-Nieto, & L. L_opez-Palacios (2015): Determinants of Default in P2P Lending. Plos One 10(10): p. e0139427. Tsai, K. (2014): Peer Lending Risk Predictor Support Vector Machines (SVM ). pp. 1-5. Wardrop, R., B. Zhang, R. Rau, & M. Gray (2015): The European Alternative Finance Benchmarking Report. p. 44. Wu, J. (2014): Loan default prediction using lending club data. Available at http://www.wujiayu.me/assets/projects/loan-default-prediction-Jiayu-Wu.pdf pp. 1-12. |
Předběžná náplň práce |
Hypotheses:
1. Hypothesis #1: Neural network outperforms Random forest based on Lending Club data. 2. Hypothesis #2: Neural network outperforms Naïve Bayes with Gaussian. 3. Hypothesis #3: Our model has the highest AUC among other credit scoring models based on Lending Club data. Methodology: Our data set with loans issued between January 2009 and December 2013. This data set is large enough to insure inter-temporal validity. The models will be evaluated based on the AUC (area under curve) score of ROC curve (Receiver operating characteristics). Expected Contribution: Various credit scoring models will be comprehensively compared based on the Lending Club data. We will get the answer what is the best credit scoring model for Lending Club. Carmichael (2014) and Serrano-Cinca et al. (2015) researched determinants of borrowers’ default in Lending Club data. We want to use these determinants to develop our own model outperforming other credit scoring model for Lending Club data. |
Předběžná náplň práce v anglickém jazyce |
1. Introduction
2. Literature Review 3. Data Description 4. Hypotheses 5. Results 6. Summary 7. Bibliography 8. Appendix |