Multi-Dimensional Euclidean Distance: Calculating Distance for N Features


data science

In data science and machine learning, measuring the similarity between two points is a foundational task. One of the most common ways to do this is by calculating the Euclidean Distance.

While many beginners learn the formula for 2D points (X and Y), the math stays remarkably similar even as you add more dimensions (features). In this guide, we’ll look at how to calculate distance for 4 features and how to write a generalized Python function for any number of dimensions.


The Formula

The Euclidean distance between two points $P$ and $Q$ in $n$-dimensional space is the square root of the sum of the squares of the differences between their coordinates.

$$Distance = \sqrt{\sum_{i=1}^{n} (Q_i - P_i)^2}$$

For 4 Features:

If we have Point A $(a_1, a_2, a_3, a_4)$ and Point B $(b_1, b_2, b_3, b_4)$: $$Dist = \sqrt{(b_1-a_1)^2 + (b_2-a_2)^2 + (b_3-a_3)^2 + (b_4-a_4)^2}$$


Manual Calculation in Python

If you have two specific points, you can calculate the distance using Python’s math.sqrt():

import math

# Point A: (Width, Height, Weight, ColorValue)
A = (1, 2, 5, 1)

# Point B: (Width, Height, Weight, ColorValue)
B = (7, 1, 3, 0)

dist = math.sqrt((7-1)**2 + (1-2)**2 + (3-5)**2 + (0-1)**2)

print(f"Distance: {dist:.4f}")
# Output: 6.4807

Creating a Generalized Function

In a real project, you don’t want to hardcode the subtraction for every feature. We can write a function that takes two lists of any length and returns the distance.

import math

def calculate_euclidean_distance(p1, p2):
    if len(p1) != len(p2):
        raise ValueError("Points must have the same number of dimensions")

    squared_differences = []
    for i in range(len(p1)):
        diff = (p1[i] - p2[i]) ** 2
        squared_differences.append(diff)

    return math.sqrt(sum(squared_differences))

# Example usage with 4 features
point_x = [10, 20, 15, 5]
point_y = [12, 18, 14, 8]

result = calculate_euclidean_distance(point_x, point_y)
print(f"Result: {result}")

The “Professional” Way: Using NumPy

If you are working on a data science project with large datasets, calculating distances in a loop is slow. The NumPy library is optimized for these calculations and is much more concise.

import numpy as np

p1 = np.array([1, 2, 5, 1])
p2 = np.array([7, 1, 3, 0])

# NumPy can subtract entire arrays and calculate the norm in one step
dist = np.linalg.norm(p1 - p2)

print(f"NumPy Distance: {dist:.4f}")

Why does this matter?

Euclidean distance is the engine behind many popular machine learning algorithms, including: 1. K-Nearest Neighbours (K-NN): Finding the closest neighbors to a new data point. 2. K-Means Clustering: Assigning data points to the nearest cluster center. 3. Recommendation Systems: Finding users with similar tastes (coordinates).

Related Posts:

Written by

Abdur-Rahmaan Janhangeer

Chef

Python author of 7+ years having worked for Python companies around the world

Suggested Posts

Measures in Statistics for Data Science

Statistics is a critical component of data science and machine learning algorithms. Almost all the m...

Read article

Principle Component Analysis in Machine Learning

In machine learning to achieve higher accuracies and performance of the model, the data quality play...

Read article

Data Scaling Techniques in Machine Learning

Data and its quality affect machine learning models and their accuracy, and the quality of the data ...

Read article
Free Flask Course