Mahalanobis Distance

Sorry for the delay between posts.  I have been busy preparing a paper for an upcoming conference.  This post will be covering Mahalanobis Distance, something which is somewhat related to the previous post on PCA.

You can find the Python code for this post on github

If you have yet to prepare your development environment, please see this post

The Mahalanobis distance is unitless measure of distance of a data point p from a distribution of points D.  If p is positioned at the mean (μ) of the distribution then the Mahalanobis distance is zero.  As the point moves along each principal axis of the distribution then the Mahalanobis distance begins to increase, indicating the number of standard deviations the point is away from the mean of D.  It is through the relationship with the principal axes that the Mahalanobis distance takes into account the correlation within the data set.    It is important to note that the Mahalanobis Distance is not the euclidean (ordinary straight line) distance.

One common use of the Mahalanobis distance within machine learning is in its use as a cost function in learning algorithms; many algorithms need to determine how similar or ‘close’ a data point or points is / are from a set of other points.  This is where Mahalanobis can help; by minimising the Mahalanobis distance, an algorithm is maximising the liklihood that a test point is from the distribution.

So let’s look at how to compute the Mahalanobis distance:  Say we had a distribution as shown in gray points in the image below and we wanted to know the distance of each of the test points from the distribution.

Distribution with test points

To compute the distance of each points p from a distribution D, we need to know the following statistics:

• The mean (μ) of the distribution
• The covariance matrix (S) of the distribution (D)
• The location of point (p)

To compute the Mahalanobis Distance:

$D_m = \sqrt{(p - \mu)S^{-1}(p - \mu)^T}$

It is here we can see that if p = μ then the distance given would be zero.  The other important aspect to note is that the INVERSE of the covariance matrix is used, not the covariance matrix itself.  When computing the inverse matrix in numpy, it may be useful / necessary to use the pseudoinverse, which allows the inverse to be created in scenarios where it may otherwise fail; such as non square matrices.

So as per the code example on github, if we had the following points:

• p1 = 17.5, 27.3)
• p2 = μ
• p3 = (24, 20)

Then the corresponding Mahanobis distance is:

• Dm1: 5.77
• Dm2= 0
• Dm3 = 4.01