An optimization problem has to be solved by adjusting the threshold and seeking the optimum in order to balance the trade-off between the decrease in revenue and a decrease in cost.
If “Settled” means good and “Past Due” is defined as negative, then using the design of this confusion matrix plotted in Figure 6, the four areas are divided as True Positive (TN), False Positive (FP), False bad (FN) and True Negative (TN). Aligned with all the confusion matrices plotted in Figure 5, TP could be the loans that are good, and FP may be the defaults missed. Our company is interested in both of these areas. To normalize the values, two widely used mathematical terms are defined: real good Rate (TPR) and False Positive Rate (FPR). Their equations are shown below:
In this application, TPR could be the hit price of great loans, also it represents the ability of creating funds from loan interest; FPR is the rate that is missing of, also it represents the likelihood of taking a loss.
Receiver Operational Characteristic (ROC) bend is considered the most widely used plot to visualize the performance of a category model at all thresholds. In Figure 7 left, the ROC Curve associated with Random Forest model is plotted. This plot really shows the partnership between TPR and FPR, where one always goes into the same way as one other, from 0 to at least one. an excellent category model would usually have the ROC curve over the red standard, sitting because of the “random classifier”. The location Under Curve (AUC) can be a metric for assessing the category model besides precision. The AUC for the Random Forest model is 0.82 away from 1, which can be decent.
Although the ROC Curve plainly shows the connection between TPR and FPR, the limit can be an implicit adjustable. The optimization task cannot purely be done by the ROC Curve. Consequently, another measurement is introduced to incorporate the limit variable, as plotted in Figure 7 right. Continue Reading