, , , , , ,

Machine Learning and Predictive Modelling

In the financial world the word “risk” most often refers to credit risk. Predictive Analytics serves as an antidote against the poisonous accumulation of these risks. Predicting consumer risk (associated with mortgage default)
is well known as “credit score” which many rating agencies provide.

Ironically, there are two opposite ways a mortgage customer of a bank can misbehave:

Either the customer fails to pay the mortgage (for whatever reasons), or the customer repays the mortgage quickly:

Risk 1: Customer defaults on the mortgage payments (which is more common risk to bankers).

Risk 2: Usually not so common still a risk, customer prepays the mortgage (or remaining dues) early, through either from savings or refinancing with a competing bank or selling the properties etc. Whatever the case may be,
prepayment is a loss (and hence a risk) to the bank because the bank fails to collect the mortgage’s planned future interest payments.

If the bank sees the loss due to only one customer, it wouldn’t make much of the difference to the bank’s business and hence the bank might not even consider it a risk. But if the number of such customers increases, it
becomes a horror to the business.

Now let us understand, how it all works.

Suppose, you score each loan for risk with an effective predictive model. Some get high risk scores and others get low risk scores. If these (credit) risk scores are assigned well, the top half predicted as highly risky could
see almost twice as many as average turn out to be defaulters.

Understanding this simple mathematics. Let’s say that the overall mortgage default rate in any bank is 10 percent (may be this is too high but for the calculation purposes, consider it to be true). The bank then divides this
portfolio into two halves (say, 70% more risk than the average total of 10 and bank names it as high risk, and 70% less risk than the average total of 10 and bank names it as low risk. You can also assume any other risk
percentage, for example 50% or 60% or 80% etc as well, depending upon how you see the high and low risks).

Basically the bank divided the risk portfolio into two parts, one with high risky assets with a 17% default rate (since 70% more than the average total of 10% rate), and the other half consisting of low risky assets with a 3%
default rate (since 70% less than the average total of 10% rate equals to 3%). Therefore, the portfolio becomes:

High Risk Loans: 17% mortgage will default.
Low Risk Loans: 3% mortgage will default.

The bank just divided its business into two completely different worlds, one comparatively safe and other dangerous (or may be hazardous). Now the bank knows where to focus its most attentions.

Predictive analytics basically is a technology that uses experience (i.e. data) to predict the future behavior of individuals (e.g. mortgage customers) in order to make better business decisions.

Now all you need is the data to go into predictive analytics by developing suitable robust predictive model.

Having learnt the basic fundamentals of the predictive analytics, we can draw a framework of Predictive Modelling as:

Data -> Machine Learning -> Predictive Model

To learn something from data is not a complex process. For example, one can start with a question “How to distinguish between high risk and low risk mortgage customer?”. And the answer from the data (and machine
learning) could be:

“If loan interest rate is less than 5%, the risk of mortgage prepayment is 2.1%, otherwise the risk is 10%”.

PA - Risk 1

So far you are already halfway through. There is only one more step to go into machine learning (or the ability to generate a predictive model from data).


In most banking businesses, the interest rate was used (in older predictive model) as a crude input to predict risk (of default) and this model used to put each individual (mortgage customer) into one of two predictive
categories (i.e. wither high risk customer or low risk customer). Since this model considers only one predictor variable (interest rate), to predict about the behavior of the individuals, this is a typical case of “uni-variate model”.
If other important variables (such as salary, household size, age, physical conditions or credit rating etc) about the individual customer are also considered in the model, it becomes “multivariate model”.

For best prediction, we need to go multivariate. Why? because an effective predictive model surely must consider multiple factors together at once, instead of just one variable (e.g. interest rate). But which variables are to
be considered to predict the outcome? Well, if I were the boss, I would say “all the variables for which data can be collected about the customers”.

Once the business (bank) has established machine learning process, a predictive model predicts the outcome for one customer at a time (such as predicted score).


Consider a bank’s mortgage customer whose details are:

Borrower: Manoj Kumar
Borrower’s Property Value: $200,000
Borrower’s Property Type: Single Family Residence
Mortgage Value: $100,000
Interest Rate: 5.00%
Borrower’s Annual Income: $72,000
Borrower’s Net worth: $300,000
Borrower’s Age (in Years): 38
Borrower’s Education: College Graduate
Years at Borrower’s Current Address: 4
Wheter Borrower is Self Employed: No
Years Completed at Current Job: 3
Borrower’s Line of Work: Business Manager
Borrower’s Marital Status: Married
Borrower’s Children : Two
Age of Child 1 (in Years): 13
Gender of Child 1: Female
Age of Child 2 (in Years): 7
Gender of Child 2: Male
Age of Borrower’s Spouse (in Years): 32
Whether Souse Employed or Housewife: Employed
Total Numbers of Late Payments (During Current Fiscal Year): 4
Credit Score: Strong

We can say that above are predictor variables (and there could be more variables than the listed above) that will be fed as input into predictive model.

The model will consider either any or all of the variables to give an output “predictive score”.

Now the the challenge of machine learning is to program your computer to crunch data about the individual customer, while considering the important variables (any or all) to build a multivariate predictive model which will be able to predict a final score of the individual.

I hope this tutorial helps you to understand the basics of Machine learning and predictive modelling.

In the next tutorial we will learn how to make machine learn and build predictive models.