Predicting Debt Payment Defaults With No-Code Classification Algorithms
Predicting the likelihood of future events like fraud or payment defaults is a classic use-case of machine learning. With its drag-and-drop interface, Monument makes it easier than ever to tackle this classification problem. In this tutorial, in a matter of minutes, we will use real-world data on credit card default probability to train an algorithm to detect payment defaults.
Obtaining & Inspecting The Data
In the “Data Folder” of the University of California-Irvine repository linked above, there is a file called
default of credit card clients.xls. When we open it up in a spreadsheet, it looks like this:
Currently, Monument only supports one header row, but this file has two header rows. We’ll make a quick edit to attach the
X2, etc. headers in the first row to the headers of the second row. Our result looks like this:
Most of the 25 columns contain demographic and transaction information on 30,000 credit card users. We’re going to use all of this information to train an algorithm to predict future defaults.
The last column,
default payment next month, is the most important. The values in this column are either
1 for “will default next month” or
0 for “will not default next month.” Because this is historical data, we are certain about these outcomes. This kind of historical data is necessary for training an algorithm to forecast outcomes when the future is unknown.
To simulate an unknown future, we’re going to duplicate this last column (
Y_default_payment_next_month), add it to the end of the data (
Y_default_payment_next_month_example), and delete the first 100 values of the “example” column.
By deleting the first 100 rows, we’re simulating the existence of 100 new transactions and using the other 29,900 rows to train the algorithm. This “controlled setting” allows us to assess the effectiveness of the model we’re going to build.
With my training data now formatted properly, we saved it in CSV format and imported it into Monument.
Using No-Code Classification Algorithms To Detect Future Defaults
Let’s import the CSV into Monument and chart a few of the columns to get a sense of what’s happening in the data. Let’s make sure that we select the
grid chart style using the widget in the top right corner of the chart area.
When we have plotted some of the data pills and switched to grid view, we see something like this.
Our goal is to train an algorithm to use the attributes in the first 25 columns to correctly detect whether a user will or will not default in the next month. Let’s drag the
LightGBM algorithm pull from the ALGORITHMS area and drop it onto the “example” column. LightGBM is short for “Light Gradient Boosted Machine,” and is a widely-used classification algorithm.
The default LightGBM parameters do not give a great result. The initial prediction is
0.217041072 for every row; when the
Y_default_payment_next_month value changes between
0, there is no corresponding change in our prediction.
The best place to start improving the model is to click the INDEPENDENTS button in the drop-down menu on the algorithm pill and select all the columns. This will use the selected columns as independent variables, with the expectation that at least some of them have explanatory power in determining the dependent variable — what we’re trying to predict, i.e. whether the person will default next month.
(Note: we don’t select the last column, which is the real data, as the model would then use the real-world results to predict the simulated forecast, giving us an ostensibly strong model that actually has no predictive power in unknown situations. This would be an extreme example of “overfitting,” a concept we’ll cover in a future tutorial.)
Next, let’s look at the parameters. The picture below shows the default parameters for Monument’s LightGBM algorithm.
Each of the parameters listed will have some effect on the outcome. You can hover over the information icon next to each parameter to read more about what it does.
A good place to start with adjusting the LightGBM parameters are:
- Lambda helps reduce overfitting.
- Gamma produces simpler models the higher it is set. This reduces the danger of overfitting on your sample data.
- Maximum Leaf Count also reduces overfitting the higher its setting.
- Learning Rate sets the fraction of new predictions in a training cycle that are calibrated by previous predictions.
In this case, we also made slight adjustments to Training Epochs (to increase the number of training cycles available to tune the model), Bagging Fraction (to increase the percent of data randomly sampled in each training cycle), and Minimum Element Per Leaf (to increase the number of datapoints assigned to each leaf, thereby reducing overfitting).
When we adjust the parameters as above and click OK to re-run LightGBM, we get the following results. The significant difference you’ll see here is that the
1s in the first column get picked up as above
0.31 or above and the
0s get picked up as
This is an indication that it’s predicting something. You can probably improve the model by experimenting further with the parameters to widen the difference between the predicted default and no-default values. But the current model is correctly detecting some distinction and in many real-world business cases where you need results fast, it’s likely good enough.
Where To Go From Here
In a matter of minutes, we pulled in raw transaction data and trained an effective classification model to predict payment defaults. This unlocks new, data-driven decisions in a business context.
- Immediately, you can now predict future defaults. The most obvious result is now you have a model for forecasting future defaults. When next month’s data transaction data comes in, you can simply apply the algorithm you created above to get a sense of what customers you might proactively reach out to to get them on a payment plan.
- You can also use these results to frame further research to uncover the characteristics of defaulters/non-defaulters. In particular, you can use the Forecast Importance Table — available by clicking the menu icon in the top right corner of the INFO box — to see which features are most indicative of specific outcomes. Perhaps, this a significant driver of defaults warrants further data collection.
The fact that the Forecast Training Convergence chart is still (slightly) downward sloping also suggests that you can continue to improve the performance of the model by increasing the number of Training Rounds in the PARAMETERS menu.
You can also use a nearly identical approach on a number of other kinds of problems, including:
- Forecasting next quarter’s balance sheet, line by line,
- Inferring the probability that a consumer is likely to purchase a specific product based on past purchases, and
- Classifying individuals or firms based on their characteristics to guide advertising or further research.
The ability to build no-code models like this allows you to quickly iterate so that you can maximize the time you spend considering the business implications of your results, rather than get lost in a time-consuming and expensive modeling process. It also allows small teams to dynamically monitor and update models as needed to keep up with ever-changing business environments.