Analysis of loans

The purpose of the effort is to perform elementary data analysis. I want to identify loans that have not been paid off as expected, and how this status is related to other variables present. The dataset used is the ‘Loan data from Prosper’ in the Udacity data sets.

Initial set of variables

This is a list of variables that are present in the dataset. Since there are 81 variables here, I plan to select 10-15 of these variables.

##  [1] ListingKey                         
##  [2] ListingNumber                      
##  [3] ListingCreationDate                
##  [4] CreditGrade                        
##  [5] Term                               
##  [6] LoanStatus                         
##  [7] ClosedDate                         
##  [8] BorrowerAPR                        
##  [9] BorrowerRate                       
## [10] LenderYield                        
## [11] EstimatedEffectiveYield            
## [12] EstimatedLoss                      
## [13] EstimatedReturn                    
## [14] ProsperRating (numeric)            
## [15] ProsperRating (Alpha)              
## [16] ProsperScore                       
## [17] ListingCategory                    
## [18] BorrowerState                      
## [19] Occupation                         
## [20] EmploymentStatus                   
## [21] EmploymentStatusDuration           
## [22] IsBorrowerHomeowner                
## [23] CurrentlyInGroup                   
## [24] GroupKey                           
## [25] DateCreditPulled                   
## [26] CreditScoreRangeLower              
## [27] CreditScoreRangeUpper              
## [28] FirstRecordedCreditLine            
## [29] CurrentCreditLines                 
## [30] OpenCreditLines                    
## [31] TotalCreditLinespast7years         
## [32] OpenRevolvingAccounts              
## [33] OpenRevolvingMonthlyPayment        
## [34] InquiriesLast6Months               
## [35] TotalInquiries                     
## [36] CurrentDelinquencies               
## [37] AmountDelinquent                   
## [38] DelinquenciesLast7Years            
## [39] PublicRecordsLast10Years           
## [40] PublicRecordsLast12Months          
## [41] RevolvingCreditBalance             
## [42] BankcardUtilization                
## [43] AvailableBankcardCredit            
## [44] TotalTrades                        
## [45] TradesNeverDelinquent              
## [46] TradesOpenedLast6Months            
## [47] DebtToIncomeRatio                  
## [48] IncomeRange                        
## [49] IncomeVerifiable                   
## [50] StatedMonthlyIncome                
## [51] LoanKey                            
## [52] TotalProsperLoans                  
## [53] TotalProsperPaymentsBilled         
## [54] OnTimeProsperPayments              
## [55] ProsperPaymentsLessThanOneMonthLate
## [56] ProsperPaymentsOneMonthPlusLate    
## [57] ProsperPrincipalBorrowed           
## [58] ProsperPrincipalOutstanding        
## [59] ScorexChangeAtTimeOfListing        
## [60] LoanCurrentDaysDelinquent          
## [61] LoanFirstDefaultedCycleNumber      
## [62] LoanMonthsSinceOrigination         
## [63] LoanNumber                         
## [64] LoanOriginalAmount                 
## [65] LoanOriginationDate                
## [66] LoanOriginationQuarter             
## [67] MemberKey                          
## [68] MonthlyLoanPayment                 
## [69] LP_CustomerPayments                
## [70] LP_CustomerPrincipalPayments       
## [71] LP_InterestandFees                 
## [72] LP_ServiceFees                     
## [73] LP_CollectionFees                  
## [74] LP_GrossPrincipalLoss              
## [75] LP_NetPrincipalLoss                
## [76] LP_NonPrincipalRecoverypayments    
## [77] PercentFunded                      
## [78] Recommendations                    
## [79] InvestmentFromFriendsCount         
## [80] InvestmentFromFriendsAmount        
## [81] Investors                          
## 81 Levels: AmountDelinquent ... TradesOpenedLast6Months

Variable selection

The initial set of fields I’m selecting are “Term”,“LoanStatus”,“BorrowerAPR”,“ListingCategory”,“Occupation”,“IsBorrowerHomeowner”,“CreditScoreRangeLower”,“CurrentCreditLines”,“DelinquenciesLast7Years”,“DebtToIncomeRatio”,“StatedMonthlyIncome”,“LoanOriginalAmount”,“LoanOriginationQuarter” and “BankcardUtilization”. My intent is to identify and select 10-15 variables that are fairly independent, but important to the analysis. After initial investigation, I may modify or remove some of these fields. The criteria I used to select these fields are 1. Documentation of the field is clear 2. Fields seem relevant to the ‘LoanStatus’ field. 3. The fields are relatively independent of each other.

Subset selection

I read in the data and create a subset with the selected columns. I then get summary information for each of the variables in the set

## [1] 14
## [1] 113937
## 'data.frame':    113937 obs. of  14 variables:
##  $ Term                   : int  36 36 36 36 36 60 36 36 36 36 ...
##  $ LoanStatus             : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
##  $ BorrowerAPR            : num  0.165 0.12 0.283 0.125 0.246 ...
##  $ ListingCategory        : int  0 2 0 16 2 1 1 2 7 7 ...
##  $ Occupation             : Factor w/ 68 levels "","Accountant/CPA",..: 37 43 37 52 21 43 50 29 24 24 ...
##  $ IsBorrowerHomeowner    : logi  TRUE FALSE FALSE TRUE TRUE TRUE ...
##  $ CreditScoreRangeLower  : int  640 680 480 800 680 740 680 700 820 820 ...
##  $ CurrentCreditLines     : int  5 14 NA 5 19 21 10 6 17 17 ...
##  $ DelinquenciesLast7Years: int  4 0 0 14 0 0 0 0 0 0 ...
##  $ DebtToIncomeRatio      : num  0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
##  $ StatedMonthlyIncome    : num  3083 6125 2083 2875 9583 ...
##  $ LoanOriginalAmount     : int  9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
##  $ LoanOriginationQuarter : Factor w/ 33 levels "Q1 2006","Q1 2007",..: 18 8 2 32 24 33 16 16 33 33 ...
##  $ BankcardUtilization    : num  0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
##       Term                       LoanStatus     BorrowerAPR     
##  Min.   :12.00   Current              :56576   Min.   :0.00653  
##  1st Qu.:36.00   Completed            :38074   1st Qu.:0.15629  
##  Median :36.00   Chargedoff           :11992   Median :0.20976  
##  Mean   :40.83   Defaulted            : 5018   Mean   :0.21883  
##  3rd Qu.:36.00   Past Due (1-15 days) :  806   3rd Qu.:0.28381  
##  Max.   :60.00   Past Due (31-60 days):  363   Max.   :0.51229  
##                  (Other)              : 1108   NA's   :25       
##  ListingCategory                     Occupation    IsBorrowerHomeowner
##  Min.   : 0.000   Other                   :28617   Mode :logical      
##  1st Qu.: 1.000   Professional            :13628   FALSE:56459        
##  Median : 1.000   Computer Programmer     : 4478   TRUE :57478        
##  Mean   : 2.774   Executive               : 4311   NA's :0            
##  3rd Qu.: 3.000   Teacher                 : 3759                      
##  Max.   :20.000   Administrative Assistant: 3688                      
##                   (Other)                 :55456                      
##  CreditScoreRangeLower CurrentCreditLines DelinquenciesLast7Years
##  Min.   :  0.0         Min.   : 0.00      Min.   : 0.000         
##  1st Qu.:660.0         1st Qu.: 7.00      1st Qu.: 0.000         
##  Median :680.0         Median :10.00      Median : 0.000         
##  Mean   :685.6         Mean   :10.32      Mean   : 4.155         
##  3rd Qu.:720.0         3rd Qu.:13.00      3rd Qu.: 3.000         
##  Max.   :880.0         Max.   :59.00      Max.   :99.000         
##  NA's   :591           NA's   :7604       NA's   :990            
##  DebtToIncomeRatio StatedMonthlyIncome LoanOriginalAmount
##  Min.   : 0.000    Min.   :      0     Min.   : 1000     
##  1st Qu.: 0.140    1st Qu.:   3200     1st Qu.: 4000     
##  Median : 0.220    Median :   4667     Median : 6500     
##  Mean   : 0.276    Mean   :   5608     Mean   : 8337     
##  3rd Qu.: 0.320    3rd Qu.:   6825     3rd Qu.:12000     
##  Max.   :10.010    Max.   :1750003     Max.   :35000     
##  NA's   :8554                                            
##  LoanOriginationQuarter BankcardUtilization
##  Q4 2013:14450          Min.   :0.000      
##  Q1 2014:12172          1st Qu.:0.310      
##  Q3 2013: 9180          Median :0.600      
##  Q2 2013: 7099          Mean   :0.561      
##  Q3 2012: 5632          3rd Qu.:0.840      
##  Q2 2012: 5061          Max.   :5.950      
##  (Other):60343          NA's   :7604

The following columns can be converted to factors - Term, Listing category. Also the loan origination quarter may need to be modified so that they can be ordered chronologically.

Univariate plots

As a first step I’ll create histograms for the variables listed. For ‘Listing categories’, which is a numerical column, the categories are : 0 - Not Available, 1 - Debt Consolidation, 2 - Home Improvement, 3 - Business, 4 - Personal Loan, 5 - Student Use, 6 - Auto, 7- Other, 8 - Baby&Adoption, 9 - Boat, 10 - Cosmetic Procedure, 11 - Engagement Ring, 12 - Green Loans, 13 - Household Expenses, 14 - Large Purchases, 15 - Medical/Dental, 16 - Motorcycle, 17 - RV, 18 - Taxes, 19 - Vacation, 20 - Wedding Loans

## 
##              Cancelled             Chargedoff              Completed 
##                      5                  11992                  38074 
##                Current              Defaulted FinalPaymentInProgress 
##                  56576                   5018                    205 
##   Past Due (>120 days)   Past Due (1-15 days)  Past Due (16-30 days) 
##                     16                    806                    265 
##  Past Due (31-60 days)  Past Due (61-90 days) Past Due (91-120 days) 
##                    363                    313                    304

Loan Status :

From the histogram and the table, about 32% of the loans are completed or in final payments. 16% have not been successful outcome, i.e. either chargedoff, defaulted or payments are past due. So, 33% of all loans that are not current have failed. This is a large percentage and needs to be investigated. It will be easier if a new column is created which simplifies the outcomes. All chargedoff, defaulted or past due loans will be marked as failed, Completed loans will be marked as successful and Current loans will be marked as ongoing.

## 
##                                                        Accountant/CPA 
##                               3588                               3233 
##           Administrative Assistant                            Analyst 
##                               3688                               3602 
##                          Architect                           Attorney 
##                                213                               1046 
##                          Biologist                         Bus Driver 
##                                125                                316 
##                         Car Dealer                            Chemist 
##                                180                                145 
##                      Civil Service                             Clergy 
##                               1457                                196 
##                           Clerical                Computer Programmer 
##                               3164                               4478 
##                       Construction                            Dentist 
##                               1790                                 68 
##                             Doctor                Engineer - Chemical 
##                                494                                225 
##              Engineer - Electrical              Engineer - Mechanical 
##                               1125                               1406 
##                          Executive                            Fireman 
##                               4311                                422 
##                   Flight Attendant                       Food Service 
##                                123                               1123 
##            Food Service Management                          Homemaker 
##                               1239                                120 
##                           Investor                              Judge 
##                                214                                 22 
##                            Laborer                        Landscaping 
##                               1595                                236 
##                 Medical Technician                  Military Enlisted 
##                               1117                               1272 
##                   Military Officer                       Nurse's Aide 
##                                346                                491 
##                        Nurse (LPN)                         Nurse (RN) 
##                                492                               2489 
##                              Other                         Pharmacist 
##                              28617                                257 
##         Pilot - Private/Commercial  Police Officer/Correction Officer 
##                                199                               1578 
##                     Postal Service                          Principal 
##                                627                                312 
##                       Professional                          Professor 
##                              13628                                557 
##                       Psychologist                            Realtor 
##                                145                                543 
##                          Religious                  Retail Management 
##                                124                               2602 
##                 Sales - Commission                     Sales - Retail 
##                               3446                               2797 
##                          Scientist                      Skilled Labor 
##                                372                               2746 
##                      Social Worker         Student - College Freshman 
##                                741                                 41 
## Student - College Graduate Student           Student - College Junior 
##                                245                                112 
##           Student - College Senior        Student - College Sophomore 
##                                188                                 69 
##        Student - Community College         Student - Technical School 
##                                 28                                 16 
##                            Teacher                     Teacher's Aide 
##                               3759                                276 
##              Tradesman - Carpenter            Tradesman - Electrician 
##                                120                                477 
##               Tradesman - Mechanic                Tradesman - Plumber 
##                                951                                102 
##                       Truck Driver                    Waiter/Waitress 
##                               1675                                436

Occupation :

The field Occupation has many entries. The professions with the largest count of loans are ‘Other’ and ‘Professional’. Also, these values are sometimes ambiguous. For example the fields ‘Professional’ and ‘Principal’ are not clear on the exact field. With so many values, an analysis of this field may not provide much value. Going forward, this field will not be checked.

## 
##     0     1     2     3     4     5     6     7     8     9    10    11 
## 16965 58308  7433  7189  2395   756  2572 10494   199    85    91   217 
##    12    13    14    15    16    17    18    19    20 
##    59  1996   876  1522   304    52   885   768   771

Listing Category :

This is a numeric field with these categories : 0 - Not Available, 1 - Debt Consolidation, 2 - Home Improvement, 3 - Business, 4 - Personal Loan, 5 - Student Use, 6 - Auto, 7- Other, 8 - Baby&Adoption, 9 - Boat, 10 - Cosmetic Procedure, 11 - Engagement Ring, 12 - Green Loans, 13 - Household Expenses, 14 - Large Purchases, 15 - Medical/Dental, 16 - Motorcycle, 17 - RV, 18 - Taxes, 19 - Vacation, 20 - Wedding Loans

It would help to convert this field to a factor.From the histogram, the most common categories are ‘Debt Consolidation’(Close to 50%),‘Not Available’(15%) or ‘Other’(10%). Not sure if the applicants were not sure of the purpose of the loan, or didnt care to fill the field accurately. The field may not be very helpful, but it will be interesting to check the failure rate for the various categories.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## 0.00653 0.15630 0.20980 0.21880 0.28380 0.51230      25

Borrower APR :

The Borrower APR varies normally from a minimum of 0.01 to a max of 0.5. The peak count is at 0.2. There seems to be a high count of about 6000 for the APR of 0.36, which breaks out of the normal curve.

## 
## FALSE  TRUE 
## 56459 57478

Homeowner :

Count of homeowners and renters are about equal, close to 56000 each.It will be interesting to check the loan failure rate for these two categories, in order to check on any differences.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     0.0   660.0   680.0   685.6   720.0   880.0     591

Credit Scores :

Credit scores range from 400 at the low end to 880. It follows a bell curve, with some gaps.It will be interesting to check if there is any corelation between the credit scores and the bad loans. Also, since credit scores are based on some of the other fields such as ‘Salary’, it will be helpful to check the corelation between them.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00    7.00   10.00   10.32   13.00   59.00    7604

Current credit lines :

All borrowers seem to have credit lines. The range is from 1 to 40 credit lines. The curve is positively skewed. i.e. There are more borrowers with higher number of credit lines. It will be interesting to check if those with higher number of credit lines can pay off he loans in a timely manner.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.000   0.000   4.155   3.000  99.000     990

Delinquencies :

As expected the curve shows a decrease in count as the number of delinquencies increase. There are 4000 borrowers with zero deliquency. Maximum number of delinquencies is 49.I’d like to check if borrowers with more delinquencies have higher loan repayment failures.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.140   0.220   0.276   0.320  10.010    8554

Debt to income ratio :

The debt to income ratio has a peak count at about 0.18. The maximum count is at a value of 0.12. I’d expect those with a lower debtoincome ratio to be better at paying off the loan, and would expect them to have a better credit score.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    4000    6500    8337   12000   35000

Loan Original amount :

I plotted this in a log scale since there were many borrowers with loan amounts less than 10,000. The graph shows the peak count of loans around 5000 and maximum values at 35000. It will be interesting to check the status of loans by the amount of the loan.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    3200    4667    5608    6825 1750000

Stated Monthly income :

This is a positively skewed curve, with a max count at about $4000. I expect credit scores to be directly correlated to the monthly income. I’m not sure if some of the higher figures are accurate. i.e. max value of monthly income is stated as 1750003.

Loan Origination quarter :

The loans origination date ranges from 2006-2013. The graph is not chronologically sorted. It will help to modify this field for that purpose. Also, if the results for 2006-2008 are starkly different from 2009-2013 due to pre and post financial crisis scenario, it may help to separate out the data.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.310   0.600   0.561   0.840   5.950    7604
## 
##    0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09  0.1 0.11 0.12 0.13 0.14 
## 6782  730  632  606  611  629  577  603  538  531  616  594  549  609  646 
## 0.15 0.16 0.17 0.18 0.19  0.2 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 
##  607  641  626  625  660  680  702  697  700  722  711  741  783  813  746 
##  0.3 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39  0.4 0.41 0.42 0.43 0.44 
##  773  803  764  757  787  736  788  819  820  831  845  857  903  853  878 
## 0.45 0.46 0.47 0.48 0.49  0.5 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 
##  926  893  922  937  924  967 1012  930  994 1015  967 1031 1049 1002 1016 
##  0.6 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69  0.7 0.71 0.72 0.73 0.74 
## 1074 1010 1104 1108 1156 1066 1155 1095 1102 1118 1160 1115 1067 1066 1123 
## 0.75 0.76 0.77 0.78 0.79  0.8 0.81 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89 
## 1043 1074 1130 1081 1113 1144 1166 1169 1211 1233 1199 1243 1303 1313 1302 
##  0.9 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99    1 1.01 1.02 1.03 1.04 
## 1334 1427 1507 1546 1664 1791 1798 2053 2171 1719  830  400  243  164  105 
## 1.05 1.06 1.07 1.08 1.09  1.1 1.11 1.12 1.13 1.14 1.15 1.16 1.17 1.18 1.19 
##   95   64   58   51   46   40   29   34   30   28   26   17   19   18   15 
##  1.2 1.21 1.22 1.23 1.24 1.25 1.26 1.27 1.28 1.29  1.3 1.31 1.32 1.33 1.34 
##   13   15    9   13   10   12    8    8    9    7    6   10    8    2   13 
## 1.35 1.36 1.37 1.38 1.39  1.4 1.41 1.43 1.44 1.45 1.46 1.47  1.5 1.51 1.52 
##    6    4    4    3    1    4    3    3    1    2    4    1    5    5    3 
## 1.53 1.54 1.55 1.57 1.58 1.59  1.6 1.61 1.62 1.63 1.64 1.65 1.66 1.69  1.7 
##    2    5    1    3    1    1    2    2    2    1    1    3    1    1    1 
## 1.72 1.73 1.75 1.76 1.78 1.79  1.8 1.81 1.82 1.83 1.84 1.86 1.88 1.89  1.9 
##    1    2    2    1    1    2    1    1    1    1    1    3    1    1    1 
## 1.96 1.97 1.99    2 2.04 2.05 2.07 2.21 2.23 2.25 2.27 2.35 2.36 2.46  2.5 
##    1    1    2    2    1    1    1    1    1    1    1    1    1    1    1 
## 2.66 2.68 3.47 4.73 5.83 5.95 
##    1    1    1    1    1    1

Bankcard Utilization

Bank card utilzation count increases steadily from 2000 with 0.05 utilization to 7500 with 1.0 utilization. Intermittently, some values such as 0.01, 0.16, 0.28 have a higher count. There also seem to be some erroneous values greater than 1.

Data Changes

Listing Category

I added Labels to the listing categories to enable convenient analysis of graphs.

Term

I changed the term field to a factor. This will improve the display.

Bank Card utilization

I removed entries from the set that have invalid values for Bank Card Utilization

Loan Origination Quarter

I changed the Loan origination guarter to the format ‘2012 Q3’ instead of ‘Q3 2012’. This will help us order the graphs chronologically. The graph seems to show a dip in the count of loans beginning 2008 Q2 to 2011 Q4. This could be due to the impact of the 2008 financial crisis.

## [1] "Good"    "Ongoing" "Ongoing" "Ongoing" "Ongoing" "Ongoing"

Loan Result

I created a new column ‘Loan Result’ that simplifies the Loan Status. It has one of the three values - ‘Bad’, Good’ and ‘Ongoing’. All completed and nearly completed loans are good. Loans other that this and current are ‘bad’.

Bivariate and Multivariate plots

Next I recreate many of the previous histogram plots, with the results colored as ‘LoanResult’

## 
##     Bad    Good Ongoing 
##   15227   33070   56292

Loan Result

From the total number of loans, about 53000 are ongoing, 33000 have completed successfully and 15000 have failed. i.e. the overall ratio of bad to good loans is 0.46 and the ratio of bad to loans that are not current is 0.32. I plan to focus on the factors which tip the ratio towards bad loans.

Another bivariate plot is the comparison of ‘debt to income ratio’ with the LoanResult

## mySet$LoanResult: Bad
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.0000  0.1500  0.2300  0.3542  0.3500 10.0100    1603 
## -------------------------------------------------------- 
## mySet$LoanResult: Good
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.0000  0.1300  0.2000  0.2644  0.2900 10.0100    2661 
## -------------------------------------------------------- 
## mySet$LoanResult: Ongoing
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.160   0.230   0.262   0.320  10.010    4090

DebtToIncomeRatio

The summary data shows median DebttoIncome ratio of 0.23 for bad loans Vs 0.20 for the good loans. However there are a large number of outliers in both cases. I’m cannot consider this result reliable enough to investigate further.

BorrowerAPR

From the plot, it seems the count of bad loans increases steadily with increase in APR, and then spikes up at an 0.36 APR. It may help to plot bad loans as a percentage of total loans.

## mySet$ListingCategory: Not Available
## 
##     Bad    Good Ongoing 
##    3330    5549      12 
## -------------------------------------------------------- 
## mySet$ListingCategory: Debt Consolidation
## 
##     Bad    Good Ongoing 
##    5562   13008   39024 
## -------------------------------------------------------- 
## mySet$ListingCategory: Home Improvement
## 
##     Bad    Good Ongoing 
##    1001    2423    3940 
## -------------------------------------------------------- 
## mySet$ListingCategory: Business
## 
##     Bad    Good Ongoing 
##    1555    2834    2726 
## -------------------------------------------------------- 
## mySet$ListingCategory: Personal Loan
## 
##     Bad    Good Ongoing 
##     690    1552       0 
## -------------------------------------------------------- 
## mySet$ListingCategory: Student Use
## 
##     Bad    Good Ongoing 
##     182     542       0 
## -------------------------------------------------------- 
## mySet$ListingCategory: Auto
## 
##     Bad    Good Ongoing 
##     366    1207     964 
## -------------------------------------------------------- 
## mySet$ListingCategory: Other
## 
##     Bad    Good Ongoing 
##    1661    4512    4190 
## -------------------------------------------------------- 
## mySet$ListingCategory: Baby&Adoption
## 
##     Bad    Good Ongoing 
##      19      26     153 
## -------------------------------------------------------- 
## mySet$ListingCategory: Boat
## 
##     Bad    Good Ongoing 
##       5      25      55 
## -------------------------------------------------------- 
## mySet$ListingCategory: Cosmetic Procedure
## 
##     Bad    Good Ongoing 
##      13      34      43 
## -------------------------------------------------------- 
## mySet$ListingCategory: Engagement Ring
## 
##     Bad    Good Ongoing 
##      13      59     144 
## -------------------------------------------------------- 
## mySet$ListingCategory: Green Loans
## 
##     Bad    Good Ongoing 
##      12      11      36 
## -------------------------------------------------------- 
## mySet$ListingCategory: Household Expenses
## 
##     Bad    Good Ongoing 
##     292     373    1307 
## -------------------------------------------------------- 
## mySet$ListingCategory: Large Purchases
## 
##     Bad    Good Ongoing 
##      83     140     649 
## -------------------------------------------------------- 
## mySet$ListingCategory: Medical/Dental
## 
##     Bad    Good Ongoing 
##     201     234    1067 
## -------------------------------------------------------- 
## mySet$ListingCategory: Motorcycle
## 
##     Bad    Good Ongoing 
##      16      87     200 
## -------------------------------------------------------- 
## mySet$ListingCategory: RV
## 
##     Bad    Good Ongoing 
##       2      16      33 
## -------------------------------------------------------- 
## mySet$ListingCategory: Taxes
## 
##     Bad    Good Ongoing 
##      83     161     638 
## -------------------------------------------------------- 
## mySet$ListingCategory: Vacation
## 
##     Bad    Good Ongoing 
##      73     150     538 
## -------------------------------------------------------- 
## mySet$ListingCategory: Wedding Loans
## 
##     Bad    Good Ongoing 
##      68     127     573

Listing Category

The highest count of bad loans is for ‘Debt consolidation’ - About 5500. However, it is not clear which Listing Category has the highest percentage of bad loans.

## mySet$IsBorrowerHomeowner: FALSE
## 
##     Bad    Good Ongoing 
##    7859   16545   25987 
## -------------------------------------------------------- 
## mySet$IsBorrowerHomeowner: TRUE
## 
##     Bad    Good Ongoing 
##    7368   16525   30305

Homeowner

Both Homeowners and renters seem to have approximately the same counts for successful and failed loans. It may not help to investigate this field. Will not continue with this factor further.

## mySet$CreditScoreRangeLower: 520
## 
##     Bad    Good Ongoing 
##     521     404       0 
## -------------------------------------------------------- 
## mySet$CreditScoreRangeLower: 540
## 
##     Bad    Good Ongoing 
##     288     276       0 
## -------------------------------------------------------- 
## mySet$CreditScoreRangeLower: 560
## 
##     Bad    Good Ongoing 
##     381     415       0 
## -------------------------------------------------------- 
## mySet$CreditScoreRangeLower: 580
## 
##     Bad    Good Ongoing 
##     346     401       0 
## -------------------------------------------------------- 
## mySet$CreditScoreRangeLower: 600
## 
##     Bad    Good Ongoing 
##    1001    1408     369 
## -------------------------------------------------------- 
## mySet$CreditScoreRangeLower: 620
## 
##     Bad    Good Ongoing 
##    1090    1757     638 
## -------------------------------------------------------- 
## mySet$CreditScoreRangeLower: 640
## 
##     Bad    Good Ongoing 
##    2124    3646    5574 
## -------------------------------------------------------- 
## mySet$CreditScoreRangeLower: 660
## 
##     Bad    Good Ongoing 
##    2072    3909    9816 
## -------------------------------------------------------- 
## mySet$CreditScoreRangeLower: 680
## 
##     Bad    Good Ongoing 
##    2029    4004    9924 
## -------------------------------------------------------- 
## mySet$CreditScoreRangeLower: 700
## 
##     Bad    Good Ongoing 
##    1721    3667    9729 
## -------------------------------------------------------- 
## mySet$CreditScoreRangeLower: 720
## 
##     Bad    Good Ongoing 
##    1358    3545    7660 
## -------------------------------------------------------- 
## mySet$CreditScoreRangeLower: 740
## 
##     Bad    Good Ongoing 
##     958    2987    5083 
## -------------------------------------------------------- 
## mySet$CreditScoreRangeLower: 760
## 
##     Bad    Good Ongoing 
##     688    2459    3267 
## -------------------------------------------------------- 
## mySet$CreditScoreRangeLower: 780
## 
##     Bad    Good Ongoing 
##     364    1834    2272 
## -------------------------------------------------------- 
## mySet$CreditScoreRangeLower: 800
## 
##     Bad    Good Ongoing 
##     182    1149    1194 
## -------------------------------------------------------- 
## mySet$CreditScoreRangeLower: 820
## 
##     Bad    Good Ongoing 
##      79     721     537 
## -------------------------------------------------------- 
## mySet$CreditScoreRangeLower: 840
## 
##     Bad    Good Ongoing 
##      15     333     185 
## -------------------------------------------------------- 
## mySet$CreditScoreRangeLower: 860
## 
##     Bad    Good Ongoing 
##      10     136      39 
## -------------------------------------------------------- 
## mySet$CreditScoreRangeLower: 880
## 
##     Bad    Good Ongoing 
##       0      19       5

Credit Score

from the histogram, the distribution of bad loans, good loans and ongoing loans seems to be close to normal. The ‘Bad loans curve’ is skewed to the left, and that for good loans and Ongoing loans is skewed to the right. i.e. borrowers with lower credit scores seem have more failed loans, and those with higher credit scores seem to have more successful loan payback.

## mySet$CurrentCreditLines: 0
## 
##     Bad    Good Ongoing 
##     175     135      74 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 1
## 
##     Bad    Good Ongoing 
##     500     579     244 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 2
## 
##     Bad    Good Ongoing 
##     694    1055     628 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 3
## 
##     Bad    Good Ongoing 
##     823    1478    1105 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 4
## 
##     Bad    Good Ongoing 
##     972    1927    1762 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 5
## 
##     Bad    Good Ongoing 
##     998    2235    2974 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 6
## 
##     Bad    Good Ongoing 
##    1099    2524    3618 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 7
## 
##     Bad    Good Ongoing 
##    1189    2735    4833 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 8
## 
##     Bad    Good Ongoing 
##    1078    2805    4953 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 9
## 
##     Bad    Good Ongoing 
##    1064    2606    4935 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 10
## 
##     Bad    Good Ongoing 
##     969    2327    4737 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 11
## 
##     Bad    Good Ongoing 
##     851    2173    4387 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 12
## 
##     Bad    Good Ongoing 
##     805    1839    3802 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 13
## 
##     Bad    Good Ongoing 
##     717    1617    3294 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 14
## 
##     Bad    Good Ongoing 
##     640    1384    2856 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 15
## 
##     Bad    Good Ongoing 
##     487    1113    2347 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 16
## 
##     Bad    Good Ongoing 
##     439     970    2036 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 17
## 
##     Bad    Good Ongoing 
##     351     744    1512 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 18
## 
##     Bad    Good Ongoing 
##     293     590    1347 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 19
## 
##     Bad    Good Ongoing 
##     213     442    1071 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 20
## 
##     Bad    Good Ongoing 
##     178     382     808 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 21
## 
##     Bad    Good Ongoing 
##     141     281     643 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 22
## 
##     Bad    Good Ongoing 
##     102     238     485 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 23
## 
##     Bad    Good Ongoing 
##      96     193     377 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 24
## 
##     Bad    Good Ongoing 
##      75     163     323 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 25
## 
##     Bad    Good Ongoing 
##      64     114     264 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 26
## 
##     Bad    Good Ongoing 
##      52      95     201 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 27
## 
##     Bad    Good Ongoing 
##      34      66     150 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 28
## 
##     Bad    Good Ongoing 
##      20      56     128 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 29
## 
##     Bad    Good Ongoing 
##      22      37      86 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 30
## 
##     Bad    Good Ongoing 
##      22      31      66 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 31
## 
##     Bad    Good Ongoing 
##      11      36      44 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 32
## 
##     Bad    Good Ongoing 
##       7      27      41 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 33
## 
##     Bad    Good Ongoing 
##      10      16      36 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 34
## 
##     Bad    Good Ongoing 
##       3      12      24 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 35
## 
##     Bad    Good Ongoing 
##       7      10      23 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 36
## 
##     Bad    Good Ongoing 
##       5       6      23 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 37
## 
##     Bad    Good Ongoing 
##       3       5      15 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 38
## 
##     Bad    Good Ongoing 
##       2       7      14 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 39
## 
##     Bad    Good Ongoing 
##       2       3       8 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 40
## 
##     Bad    Good Ongoing 
##       3       2       5 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 41
## 
##     Bad    Good Ongoing 
##       4       2       2 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 42
## 
##     Bad    Good Ongoing 
##       0       1       2 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 43
## 
##     Bad    Good Ongoing 
##       0       0       1 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 44
## 
##     Bad    Good Ongoing 
##       0       3       1 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 45
## 
##     Bad    Good Ongoing 
##       1       0       2 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 46
## 
##     Bad    Good Ongoing 
##       1       0       0 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 47
## 
##     Bad    Good Ongoing 
##       1       0       2 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 48
## 
##     Bad    Good Ongoing 
##       2       1       0 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 51
## 
##     Bad    Good Ongoing 
##       0       1       0 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 52
## 
##     Bad    Good Ongoing 
##       2       1       0 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 54
## 
##     Bad    Good Ongoing 
##       0       0       3 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 56
## 
##     Bad    Good Ongoing 
##       0       2       0 
## -------------------------------------------------------- 
## mySet$CurrentCreditLines: 59
## 
##     Bad    Good Ongoing 
##       0       1       0

Current credit lines

The ratio of bad to good loans is higher at the extremes, i.e. below credit lines of 3 and higher than credit lines of 30. Between 3-30 credit lines, the count of bad loans is flat seems to be much lower than that of the good loans.

Delinquencies

The number of Bad loans to Good loans is higher when the number of delinquencies is low. However, from the graph the ratio of bad to good loan count seems to increase as the number of delinquencies increase.

Debt to income ratio

The ratio of bad to good loans is low when debt to Income ratio is low. The ratio seems to increase after DebttoIncomeRatio exceeds 0.3.

Loan Original amount

Most of the loans are below the value of 10000. Also, the ratio of bad to good loans is low when the loan amount is below 10,000. As the loan amount increases beyond this value the ratio increases.

Stated Monthly income

Most of the loans are located in the mid range of monthly income - between 3000 & 8000. It is not clear how the ratio changes across the range of monthly income. Also, creating the table of loan results for each value of monthly income does not helpful, since there are too many entries.

## mySet$LoanOriginationQuarter: 2007 Q1
## 
##     Bad    Good Ongoing 
##     484     842       0 
## -------------------------------------------------------- 
## mySet$LoanOriginationQuarter: 2007 Q2
## 
##     Bad    Good Ongoing 
##    1167    1798       0 
## -------------------------------------------------------- 
## mySet$LoanOriginationQuarter: 2007 Q3
## 
##     Bad    Good Ongoing 
##     969    1575       0 
## -------------------------------------------------------- 
## mySet$LoanOriginationQuarter: 2007 Q4
## 
##     Bad    Good Ongoing 
##     849    1605       0 
## -------------------------------------------------------- 
## mySet$LoanOriginationQuarter: 2008 Q1
## 
##     Bad    Good Ongoing 
##     952    1953       0 
## -------------------------------------------------------- 
## mySet$LoanOriginationQuarter: 2008 Q2
## 
##     Bad    Good Ongoing 
##    1367    2769       0 
## -------------------------------------------------------- 
## mySet$LoanOriginationQuarter: 2008 Q3
## 
##     Bad    Good Ongoing 
##    1036    2348       0 
## -------------------------------------------------------- 
## mySet$LoanOriginationQuarter: 2008 Q4
## 
##     Bad    Good Ongoing 
##     138     362       0 
## -------------------------------------------------------- 
## mySet$LoanOriginationQuarter: 2009 Q2
## 
##     Bad    Good Ongoing 
##       3      10       0 
## -------------------------------------------------------- 
## mySet$LoanOriginationQuarter: 2009 Q3
## 
##     Bad    Good Ongoing 
##      83     496       0 
## -------------------------------------------------------- 
## mySet$LoanOriginationQuarter: 2009 Q4
## 
##     Bad    Good Ongoing 
##     219    1210       0 
## -------------------------------------------------------- 
## mySet$LoanOriginationQuarter: 2010 Q1
## 
##     Bad    Good Ongoing 
##     184    1036       0 
## -------------------------------------------------------- 
## mySet$LoanOriginationQuarter: 2010 Q2
## 
##     Bad    Good Ongoing 
##     258    1263       1 
## -------------------------------------------------------- 
## mySet$LoanOriginationQuarter: 2010 Q3
## 
##     Bad    Good Ongoing 
##     191    1060       1 
## -------------------------------------------------------- 
## mySet$LoanOriginationQuarter: 2010 Q4
## 
##     Bad    Good Ongoing 
##     284    1280      17 
## -------------------------------------------------------- 
## mySet$LoanOriginationQuarter: 2011 Q1
## 
##     Bad    Good Ongoing 
##     342    1242     152 
## -------------------------------------------------------- 
## mySet$LoanOriginationQuarter: 2011 Q2
## 
##     Bad    Good Ongoing 
##     593    1165     691 
## -------------------------------------------------------- 
## mySet$LoanOriginationQuarter: 2011 Q3
## 
##     Bad    Good Ongoing 
##     662    1404     979 
## -------------------------------------------------------- 
## mySet$LoanOriginationQuarter: 2011 Q4
## 
##     Bad    Good Ongoing 
##     798    1661    1397 
## -------------------------------------------------------- 
## mySet$LoanOriginationQuarter: 2012 Q1
## 
##     Bad    Good Ongoing 
##     885    1505    1997 
## -------------------------------------------------------- 
## mySet$LoanOriginationQuarter: 2012 Q2
## 
##     Bad    Good Ongoing 
##     966    1515    2523 
## -------------------------------------------------------- 
## mySet$LoanOriginationQuarter: 2012 Q3
## 
##     Bad    Good Ongoing 
##     965    1488    3121 
## -------------------------------------------------------- 
## mySet$LoanOriginationQuarter: 2012 Q4
## 
##     Bad    Good Ongoing 
##     614    1014    2759 
## -------------------------------------------------------- 
## mySet$LoanOriginationQuarter: 2013 Q1
## 
##     Bad    Good Ongoing 
##     299     699    2598 
## -------------------------------------------------------- 
## mySet$LoanOriginationQuarter: 2013 Q2
## 
##     Bad    Good Ongoing 
##     400     682    5989 
## -------------------------------------------------------- 
## mySet$LoanOriginationQuarter: 2013 Q3
## 
##     Bad    Good Ongoing 
##     282     585    8273 
## -------------------------------------------------------- 
## mySet$LoanOriginationQuarter: 2013 Q4
## 
##     Bad    Good Ongoing 
##     218     416   13768 
## -------------------------------------------------------- 
## mySet$LoanOriginationQuarter: 2014 Q1
## 
##     Bad    Good Ongoing 
##      19      87   12026

Loan Origination quarter

The ratio of bad to good loans seem to be high prior to 2008 Q3. The number of loans and the ratio dropped since. I’m not adding any comments on the ratio in 2012, since the ongoing loans may be close to completion, so the good to bad ratio may not be valid for these years.

Bankcard Utilization

The ratio seems to be fairly steady, except for the BankCardUtilization value of 1.0, where it is high. Perhaps this value indicates borrowers with no bank cards. For this field, I did not output the table,since it would not be useful.

## mySet$Term: 12
## 
##     Bad    Good Ongoing 
##      92    1450      62 
## -------------------------------------------------------- 
## mySet$Term: 36
## 
##     Bad    Good Ongoing 
##   13198   29174   36181 
## -------------------------------------------------------- 
## mySet$Term: 60
## 
##     Bad    Good Ongoing 
##    1937    2446   20049

Loan Term

Loan Term- Majority of the loans have a 36 month term. The bad to good loan ratio is lower for the 12 month term and 36 month term as compared to the 60 month term.

Ratios evaluation

Based on the initial histograms, I plan to investigate the following variables further 1. Stated Monthly Income 2. Loan Original Amount 3. DelinquenciesLast7Years 4. Current Credit Lines 5. CreditScoreRangeLower 6. ListingCategory 7. Borrower APR 8. DebtToIncomeRatio

I plan to plot each of these variables against the ratio of bad to good loans. If there seems to be a corelation, I plan to evaluate it.

I first convert all columns to factors, in order to group the data using these factors. Values have been rounded, so that the resulting graphs will be reasonably granular.

Also, the data is filtered for all loans with a final result, since the final status of an ongoing loan is not known.

Counts return function

The function summarizes the total count of loans by the column name in the arguments. It also summarizes this information by Loan Result and column name. That will enable us to get the ratio of bad loans to total loans for each field.

We now use this function to display ratio of bad loan count to total loans for each of the fields in the frame.

ListingCategory

The categories with the highest failure rate are ‘Green Loans’, ‘Medical/Dental, ’Household expenses’ and ‘Baby Adoption’. These categories have a failure rate greater than 37.5%.

Credit Score

There is a direct corelation between the credit score and percentage of successful loans. And an inverse relation with the percentage of failed loans. The corelation can be investigated further.

Income

There are a bunch of outliers, high stated incomes where all loans have been bad.This could be a case of incorrect data. The Failure rate is scattered for high incomes and very low incomes. However for the interval monthy income between 1200-9000, there seems to be a negative corelation between count of loans failed and the income.

Loan Amount

The Failure rate seems to increase as the ‘Loan Amount’ increases. The values are more scattered with increase in Loan Amounts. There is also an outlier for failure rate when the loan value is 21500. I will check on the corelation coefficient for this metric.

Borrower APR

This metric seems to have a direct corelation to the failure rate, but only for a specific interval - 0.06 to 0.37. At APR values below 0.06 the failure rate increases. Also, at values higher than 0.37 the failure rate decreases. I’m not sure of the cause here.

Delinquencies

In the case of delinquencies, the failure rate seems to scatter as the number of delinquencies increase. It is not clear if there is a corelation. I plan to check.

DebtToIncome ratio

Ignoring an outlier for the value of 0, there seems to be a direct corelation between ‘DebtToIncomeRatio’ and Failure rate, until the DebtoIncomeRatio value of 0.5. After that, the values scatter. It will help to check the corelation coeff for this.

Credit Lines

When the number of credit lines goes beyond 40, the rate of loan failures goes high, beyond 60%. However, there is no direct corelation at values below 40. In as the number of credit lines increases from 0 to 5, the loan failure rate decreases from 40% to 30%. This could possibly be due to young inexperienced borrowers, who are new to loans.

Corelation

The next step is to calculate corelation values between failure rates and some of the metrics identified in the graphs.

##                   [,1]
## CreditScore -0.9932277

CreditScoreCoeff -0.9932

##                  [,1]
## CreditLines 0.5403611

CreditLines 0.54

##                  [,1]
## BorrowerAPR 0.7005896

BorrowerAPR 0.700

##           [,1]
## Income 0.53022

Income 0.5302

##                 [,1]
## LoanAmount 0.4963623

LoanAmount 0.4963

##                    [,1]
## Delinquencies 0.5464917

Delinquencies 0.5464

##                        [,1]
## DebtToIncomeRatio 0.7991244

DebtToIncomeRatio 0.799

Factor selection

It seems the most common factors that are well corelated,directly or inversely, to the Failure rate are Credit Score, DebtToIncomeRatio and BorrowerAPR.

To check on Failures further, I’ll plot two relatively independent variables along with the Loan Result.

Loan Amount, Credit Score, Loan Result

The graph shows that loans of amounts higher than 10000 are not approved for borrowers below the credit score of 600. That is a good idea, since the failed loans increase when the credit scores are lower.When the loan amount is low, around $1000 and the credit scores are above 700, there are hardly any failed loans. The count of failed loans reduces as the credit scores get higher, for all loan amounts. The graph has been scaled by log10 on the x axis because there are fewer loans as the loan amount gets higher.

APR, Credit Score, Loan Result

The graph illustrates the advantage of a higher credit score. Lower APRs are only predominantly given to those with higher credit scores. The mostly empty triangle in the bottom left makes this clear. Also, the empty triangle on the upper right shows that those with higher credit score dont accept high APRs for their loans. It is only those with lower credit scores that have to take on high APR loans. Consequently, most failures are for high APR loans(greater than 0.2) and low credit scores. Also, there are few borrowers between credit scores of 500-600. Few applicants in this range get approval for a loan.

APR, loan amount, Loan Result

From the graph, it seems that there are minimal loan failures when the APR is low, i.e. below 0.15. Failures increase with the APR. The count of loans increases with the loan amounts. Largest count of loans is between 5000-10,000. The empty triangle on the upper left shows that loans are not approved or accepted for amounts higher than 100000, when the APR is high (0.3-0.4).

Final plots and summary

Percentage of failed/successful loans by credit score

The graph indicates the trend of percentage of failed(red) and successful(green) loans as the credit score increases. The trend is linear as indicated by linear interpolation. The corelation coefficient for Credit Score and Ratio of failed loans is -0.9932. The credit score is created by financial agencies using information from the borrowers credit history. It seems they are doing a good job of it for the lenders.

Failed and successful loans by credit score and loan amount

The graph shows higher density of failed loans along the along the lower border and to the right. This indicates the impact of lower credit scores on all the repayment of the loans. At loan amounts higher than 10000, loans are approved only for credit ratings above 600. Even for these loans, the incidence of successful loans increases as the credit rating rises.Fewer loans are approved for borrowers below a rating of 600. The smoothing curve indicates the fact that a higher credit score is required for a larger loan amount.

Failed and successful loans by Credit Score and Borrower APR

The graph shows the failed loans increasing as the credit score decreases and as the APR increases. The density of failed loans is highest at the credit score of 500 and increases with the Borrower APR. Very few borrowers below a credit score of 600 get approved for loans below APR of 0.2.This further decreases their chances of successful repayment. The APRs less than 0.5 can be considered to be outliers. Then the smoothing curve indicates that a higher credit score provides access to loans with lower APR.

Reflections

The initial step of identifying relevant variables for investigation, is probably the most critical and is a subjective decision. Luckily, some of the variables selected were relevant to the success/failure of the loan. Some of the categorical variables evaluated were related to the loan outcome. However, I was not sure how to proceed with the evaluation using these variables. So, all the bivariate and multivariate plots consist of numerical variables. From the multivariate plots, the use of Credit Score is apparent. Banks are using this variable to decide on loan approval and on the Loan APR. The corelation between Credit score and success/failure of loans is high and graphically linear. Other factors such as loan amount and APR are also related to the loan outcome. The next steps would probably be to create a linear regression model using these variables.