Published February 13, 2023

# A Gentle Introduction to Machine Learning

Explore the power of Machine Learning in this in-depth article. Learn how to harness the latest algorithms and techniques to solve real-world problems and drive innovation. Get started on your AI journey today!

## Introduction

Machine Learning is the use of data and computing resources to answer questions by finding complex patterns and relationships within data, many of which would be difficult or impossible to be determined manually. To find these patterns machine learning models require vast amounts of data, as this helps improve the accuracy (and thus usefulness) of the outputs.

## Why is Machine Learning Important?

Machine learning is important for the modern organisation, because it helps the business leverage their data to improve decision-making, and therefore improve financial metrics like revenue and profitability. Machine learning can help the business with a number of typical problems, including:

**Predicting and forecasting key business metrics.**Example – Predicting future sales revenue as a result of increasing the advertising budget. Building a sophisticated model from large amounts of prior data allows the business to apply more accurate forecasting, rather than relying on intuitive decision-making.

**Predictive maintenance.**Example – Predicting when equipment is likely to fail or require maintenance/replacement. This provides the ability to proactively manage the risks and costs associated with unplanned downtime due to equipment failure. This information may also be required for compliance or auditing purposes.

**Resource Optimisation:**Example – Optimising supply chain by predicting demand through each stage of a sales or manufacturing process. This allows the business to minimise unnecessary costs through reducing waste, optimising inventory levels, and managing vendor relationships appropriately.

**Customer Segmentation:**Example – Analysing customer data to segment customers into different groups based on their behaviour, preferences, and demographic information. This allows the business to tailor their marketing campaigns for improved customer lifetime value, and reduce customer churn.

**Anomaly Detection:**Example – Identifying instances of unusual user behaviour, such as fraud detection (detecting fraudulent transactions in financial systems), website usage analysis (detecting suspicious user patterns to improve cybersecurity), manufacturing quality control (detecting anomalies which may impact product quality), or energy consumption analysis (detecting deviations which may indicate equipment failure).

Ultimately, as more businesses begin to implement machine learning into their way of working, using machine learning is highly relevant for maintaining an organisation’s **competitive advantage** in the marketplace.

**Supervised Learning and Unsupervised Learning**

There are broadly two categories of Machine Learning: s**upervised** learning and **unsupervised** learning.

**Supervised** machine learning models work by giving a computer lots of labelled data for reference, and then using the ‘learnings’ from this to label new, uncategorised data. An example of supervised learning is giving a computer lots of images which either contain cats or don’t (labelled appropriately), building an algorithm to recognise the common features of ‘cats’, and then using the algorithm on a series of new unlabelled pictures, to determine whether these new images are cats or not.

**Unsupervised **machine learning models are those which learn patterns from *unlabelled* data. So in the previous example – the *label* on each image (specifying whether the image contains a cat or not) would not exist. But unsupervised machine learning models can still tell you characteristics about the images, and detect patterns – you may be able to build ‘clusters’ of similar images, such as those which contain grey animals and those which contain orange animals.

Differentiating between supervised and unsupervised problems is needed for finding the most practical solution methodology. This article will focus primarily on **supervised** learning models.

## Regression in a Nutshell

**Regression** is a supervised learning model, and a mathematical technique for prediction. The goal of regression is to predict the value of something (eg. house price in $) based on one or multiple other measurements (eg. house size, number of bedrooms, distance from CBD in km). There are several types of regression, including linear regression, polynomial regression, and logistic regression.

**Linear** **regression** is a type of regression which attempts to fit a linear function (i.e. a straight line) to a set of data points. We’ll focus on linear regression for the examples used in this article.

**What is a regression line?**

So how is this “fitting” done?

A regression line (also known as the “line of best fit”) is a line which, on average, is closer to all of your data points than any other line. This means you want to *minimise* the distance between your regression line and your data points. This distance is known as the *error*, and thus a line of best fit is simply *minimising these errors*. There are a number of techniques used to measure and weight these errors, including the technique of Ordinary Least Squares (OLS), which is commonly used in linear regression.

Every straight line in two dimensions is defined by the equation *y = mx + c*. This equation relates the dependent variable *y* back to the independent variable *x*, with a linear relationship. This means that as *x* changes, the variable I am trying to track (*y*) has a constant rate of change.

This linear equation contains just two numbers – the sl*ope* (m) and the *y-intercept* (c). These two variables are known as the *coefficients* of the equation. To summarise: Training a regression model is essentially optimising the coefficients (m & c) of your linear equation, by minimising the errors of the line of best fit. This process is known as the *model training* stage.

#### Example 1: Perfect Linear Relationship

Let’s clarify with a real-world example of a linear relationship. I’m purchasing chocolate bars at a grocery store, and I want to calculate my total bill at the end.

In this scenario:

y = the cost of my grocery bill

x = the number of chocolate bars I purchase

So my linear equation is y = mx + c. Now let’s say that each chocolate bar costs $3, and the grocery store applies a flat EFTPOS surcharge of $1, regardless of how much I purchase. So if I buy three chocolate bars at $3 each, my grocery bill becomes 3 x $3 + $1, which is $10.

The linear equation that describes this relationship is y = 3x + 1. This is known as a perfect linear relationship, because calculating this will always return the exact amount of my grocery bill, without deviation.

With perfect linear relationships, there is nothing to actually “predict” – as any potential value can theoretically be calculated precisely by the equation. If I wanted to know how much it would cost to buy 87 chocolate bars, I could calculate that down to the cent.

#### Example 2: Imperfect Linear Relationship

Now let’s use an example of an imperfect linear relationship.

When I do my main grocery shop for the week, my grocery bill tends to be quite high. My trolley is also reasonably full and heavy. I hypothesise that the weight of items in my trolley (*x*) can be used to predict my grocery bill (*y*) with a linear relationship.

Of course this relationship isn’t perfect, some items are much more expensive per kg than others. But it’s a decent approximation – on average, a basket with 2 kg of groceries is likely to be a lot cheaper than a trolley with 10 kg.

Let’s take a series of example data points, and plot them on a graph, to see how the grocery bill in $ (*y*) changes as a function of the total weight of groceries in kg (*x*).

This is a bit more interesting from a predictive perspective. We can use this trend line to predict additional values that don’t exist – and while these may be approximate, they may be accurate enough to be useful. For example, I may be comfortable enough ahead of time to say that $160 cash is probably going to cover my bill for 10 kg of groceries.

**Correlation**

In other words, the two variables are *imperfectly correlated*. What does correlation mean? Correlation is simply a measure of how closely two variables are related.

A high correlation means that the variables are closely related, and one variable can be determined from the other with reasonable accuracy. Example 1 showed a relationship with an extremely high correlation (a perfect correlation), and Example 2 has shown a relationship with a reasonably high correlation. Knowing the weight of my groceries gives me a reasonably good indication of the grocery bill I’m about to receive, and knowing my grocery bill in retrospect probably indicates the size of the shopping cart I used on that day.

A low correlation means that the variables aren’t closely related – for example, knowing the weight of my groceries probably won’t tell me anything about how far away I live from the grocery store, and vice versa. Nor does my grocery bill tell me how many times I have watched the movie “Jurassic Park”. These variables would probably have a very low correlation, because knowing one of them gives me little or no information about the other.

#### Example 3: Non-Linear Relationship

I am purchasing specialty coffee beans from the grocery store. The specialty roaster charges a premium for top quality coffee beans, rated on a scale of 1-10 stars.

The 5 star coffee beans cost $20/kg, but get progressively more expensive – the 6 star beans cost $40/kg, the 7 star beans cost $80/kg, the 8 star beans cost $160/kg, and the 9 star beans cost $320/kg. This cost is going up based on a *non-linear relationship *(specifically an *exponential relationship*), as the rate of change gets faster between each level of bean quality, and you end up with diminishing returns for each extra dollar you spend.

If I were trying to predict the price of the 10 star beans (i.e. predict **price** *y* based on **quality** *x*), I wouldn’t use linear regression for this, as linear regression is only suitable when the rate of change is constant. The rate of change is not constant in this case, because the price increases dramatically with each incremental jump in quality.

But these are just words – let’s apply linear regression to this scenario, then apply a non-linear regression technique, and compare the results. Even though we know that the 9 star beans are $320/kg, we will withhold this information for now – we will apply the regression on only the remaining data points (our **training set**), and then use the 9 star data point (our **test set**) to validate how accurate our predictive model is.

This actually doesn’t look too bad at a cursory glance. However, we have only applied this regression to the 5, 6, 7 and 8 star coffee beans. What if we used this trend line to predict the price of 9 star (*x = 9*) coffee beans?

*y = 46x – 224*

*y = 46*9 – 224*

*y = 190*

When comparing to our **test set** (the price of 9 star beans is $320/kg), this prediction is pretty inaccurate. So we can safely conclude that a linear regression is not a great model fit in this scenario.

Now let’s try using a non-linear method, known as an exponential regression.

This looks like a much better fit to the data. So how does this equation perform when attempting to predict new or unseen data? Let’s use the 9 star coffee beans (*x = 9*) as an example again.

*y = 0.625*2^x*

*y = 0.625*2^**9*

*y = 320*

Spot on!

Here we get to the heart of machine learning. *The benefit of using the correct regression technique is that your model will perform accurately when attempting to make predictions on new or unseen data.*

## Summary:

Here’s what we learned today:

**Supervised**machine learning models are those which train a model on previously*labelled*data (eg. images which are labelled as either containing cats or not), and use the model to predict the outcome of new*unlabelled*data.

**Unsupervised**machine learning models detect patterns in completely*unlabelled*data, such as identifying anomalies or building clusters of data which contain similar characteristics.

**Linear Regression**is a supervised machine learning model, and is one of many types of regression models. The method attempts to fit a “line of best fit” to data (based on determining the coefficients of*y = mx + c*which minimises errors), which can then be used to extrapolate and make additional predictions on unseen data.

**Non-Linear Regression**encompasses a range of techniques which attempt to fit a general trend to patterns which don’t follow a linear trend. Using linear regression is not the best technique in every situation, as we learned through Example 3, where the non-linear regression technique was much better at predicting new data.

**The Training Set**is the set of data on which you are fitting a trend line. This set should contain most, but not all, of the available data you have.

**The Test Set**is the small subset of data which you put aside during the model training. After training the model on your Training Set, your Test Set can be used to verify whether your model is effective at predicting new or unseen data.

Next time, we will be talking through a guide for applying machine learning in the statistical programming language *R*.

If you would like to look into ways that machine learning can help your business, feel free to reach out for a chat. We love interesting problems.