Closest Pair of Points - CSU083 | Shoolini University

Closest Pair of Points

1. Prerequisites

Before understanding the Closest Pair of Points algorithm, you must be familiar with:

2. What is the Closest Pair of Points Algorithm?

It is an algorithm used to find the pair of points that are closest to each other in a given set of points in a 2D plane.

2.1 Brute Force Approach

Check every possible pair and compute their distance. This has a time complexity of \( O(n^2) \).


def brute_force_closest_pair(points):
    min_distance = float('inf')
    closest_pair = None
    for i in range(len(points)):
        for j in range(i + 1, len(points)):
            dist = euclidean_distance(points[i], points[j])
            if dist < min_distance:
                min_distance = dist
                closest_pair = (points[i], points[j])
    return closest_pair, min_distance

2.2 Optimized Divide and Conquer Approach

Uses recursion by dividing the set into two halves and solving each half separately, then merging results. This reduces the complexity to \( O(n \log n) \).


import math

def euclidean_distance(p1, p2):
    return math.sqrt((p1[0] - p2[0]) ** 2 + (p1[1] - p2[1]) ** 2)

def closest_pair_recursive(px, py):
    if len(px) <= 3:
        return brute_force_closest_pair(px)

    mid = len(px) // 2
    mid_point = px[mid]

    left_px = px[:mid]
    right_px = px[mid:]
    left_py = list(filter(lambda p: p[0] <= mid_point[0], py))
    right_py = list(filter(lambda p: p[0] > mid_point[0], py))

    (p1, q1, dist1) = closest_pair_recursive(left_px, left_py)
    (p2, q2, dist2) = closest_pair_recursive(right_px, right_py)

    d = min(dist1, dist2)
    (p3, q3, dist3) = closest_split_pair(px, py, d)

    return min([(p1, q1, dist1), (p2, q2, dist2), (p3, q3, dist3)], key=lambda x: x[2])

def closest_pair(points):
    px = sorted(points, key=lambda p: p[0])
    py = sorted(points, key=lambda p: p[1])
    return closest_pair_recursive(px, py)

3. Why Does This Algorithm Exist?

The Closest Pair of Points problem is widely used in real-world scenarios:

4. When Should You Use It?

5. How Does It Compare to Alternatives?

5.1 Strengths

5.2 Weaknesses

6. Basic Implementation

Here is a basic Python implementation of the Closest Pair of Points algorithm using the Divide and Conquer approach.

6.1 Python Implementation


import math

def euclidean_distance(p1, p2):
    return math.sqrt((p1[0] - p2[0]) ** 2 + (p1[1] - p2[1]) ** 2)

def brute_force(points):
    min_dist = float('inf')
    pair = None
    for i in range(len(points)):
        for j in range(i + 1, len(points)):
            dist = euclidean_distance(points[i], points[j])
            if dist < min_dist:
                min_dist = dist
                pair = (points[i], points[j])
    return pair, min_dist

def closest_split_pair(px, py, delta):
    mid_x = px[len(px) // 2][0]
    strip = [p for p in py if abs(p[0] - mid_x) < delta]
    min_dist = delta
    best_pair = None

    for i in range(len(strip)):
        for j in range(i+1, min(i+7, len(strip))):
            dist = euclidean_distance(strip[i], strip[j])
            if dist < min_dist:
                min_dist = dist
                best_pair = (strip[i], strip[j])
    return best_pair, min_dist

def closest_pair_recursive(px, py):
    if len(px) <= 3:
        return brute_force(px)

    mid = len(px) // 2
    left_px = px[:mid]
    right_px = px[mid:]
    left_py = list(filter(lambda p: p in left_px, py))
    right_py = list(filter(lambda p: p in right_px, py))

    (p1, q1, dist1) = closest_pair_recursive(left_px, left_py)
    (p2, q2, dist2) = closest_pair_recursive(right_px, right_py)

    d = min(dist1, dist2)
    (p3, q3, dist3) = closest_split_pair(px, py, d)

    return min([(p1, q1, dist1), (p2, q2, dist2), (p3, q3, dist3)], key=lambda x: x[2])

def closest_pair(points):
    px = sorted(points, key=lambda p: p[0])
    py = sorted(points, key=lambda p: p[1])
    return closest_pair_recursive(px, py)

# Sample input points
points = [(2, 3), (12, 30), (40, 50), (5, 1), (12, 10), (3, 4)]
result = closest_pair(points)
print("Closest Pair:", result[0], "with distance:", result[2])

7. Dry Run

7.1 Given Input

We will dry-run the algorithm for the input:


points = [(2, 3), (12, 30), (40, 50), (5, 1), (12, 10), (3, 4)]

7.2 Step-by-Step Execution

Step 1: Sort Points (by x-coordinate)

Sorted px: [(2,3), (3,4), (5,1), (12,10), (12,30), (40,50)]
Sorted py: [(5,1), (2,3), (3,4), (12,10), (12,30), (40,50)]

Step 2: Divide into Left and Right Halves

Step 3: Recursively Find Closest Pair in Left Half

Step 4: Recursively Find Closest Pair in Right Half

Step 5: Combine Results

Step 6: Final Result

Final Output:

Closest Pair: (2,3) and (3,4) with distance: 1.41

8. Time & Space Complexity Analysis

8.1 Time Complexity Analysis

The Closest Pair of Points algorithm has different time complexities based on the approach used.

8.1.1 Brute Force Approach
8.1.2 Divide and Conquer Approach

9. Space Complexity Analysis

Space consumption depends on sorting, recursion, and additional arrays.

9.1 Brute Force Space Complexity
9.2 Divide and Conquer Space Complexity

Key insight: Space remains manageable since recursion depth is logarithmic.

10. Trade-Offs

10.1 Brute Force vs. Divide & Conquer

Approach Time Complexity Space Complexity Pros Cons
Brute Force O(n²) O(1) Simple, easy to implement Slow for large inputs
Divide & Conquer O(n log n) O(n) Efficient for large inputs More complex, higher space usage

10.2 Key Considerations

In practical use cases (e.g., GPS tracking, clustering), the divide & conquer approach is preferred due to its efficiency.

11. Optimizations & Variants

11.1 Common Optimizations

11.2 Variants of the Algorithm

11.2.1 3D Closest Pair of Points

When dealing with 3D space, the same divide and conquer approach applies, but now distance calculation includes the third coordinate:

$$ d = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2 + (z_2 - z_1)^2} $$

11.2.2 k-Closest Pairs

Instead of finding just one closest pair, we may want the k-closest pairs. This requires modifications to store multiple minimum distances.

11.2.3 Dynamic Closest Pair

If points are added or removed dynamically (e.g., in real-time GPS tracking), we can maintain a spatial data structure like a KD-Tree to update nearest neighbors in \(O(\log n)\) time.

12. Iterative vs. Recursive Implementations

12.1 Recursive Approach

12.2 Iterative Approach

12.3 Comparison Table

Criteria Recursive Iterative
Ease of Implementation Simple, follows divide & conquer Complex, requires manual stack handling
Memory Usage Uses recursion stack (O(log n) extra space) Avoids deep recursion, uses explicit structures
Performance Generally efficient, but recursion overhead More efficient in environments where recursion is costly
Best Use Case General cases with moderate-sized inputs Performance-critical, memory-limited systems

12.4 When to Choose Which?

Most implementations favor recursion due to cleaner logic, but iterative approaches are used in high-performance environments where recursion overhead is unacceptable.

13. Edge Cases & Failure Handling

13.1 Common Pitfalls and Edge Cases

14. Test Cases to Verify Correctness

14.1 Basic Test Cases


def test_closest_pair():
    points1 = [(2, 3), (5, 1), (12, 10), (3, 4)]
    assert closest_pair(points1) == ((2, 3), (3, 4), 1.41)

    points2 = [(1, 1), (4, 5)]
    assert closest_pair(points2) == ((1, 1), (4, 5), 5.0)

    points3 = [(0, 0)]
    try:
        closest_pair(points3)
        assert False, "Expected an error for a single point"
    except ValueError:
        pass  # Expected error

    points4 = [(1, 1), (1, 1), (2, 2)]
    assert closest_pair(points4) == ((1, 1), (1, 1), 0.0)

    points5 = [(-3, -4), (-1, -1), (-5, -6)]
    assert closest_pair(points5) == ((-3, -4), (-5, -6), 2.83)

    points6 = [(1000000, 1000000), (1000001, 1000001)]
    assert closest_pair(points6) == ((1000000, 1000000), (1000001, 1000001), 1.41)

test_closest_pair()
print("All test cases passed!")

14.2 Edge Case Test Cases

15. Real-World Failure Scenarios

15.1 Handling Real-World Failures

15.2 Practical Mitigations

Understanding these failure points helps in designing robust implementations suitable for real-world applications.

16. Real-World Applications & Industry Use Cases

16.1 Real-World Applications of the Closest Pair of Points Algorithm

The Closest Pair of Points algorithm has significant applications across various domains:

16.1.1 Geographic Information Systems (GIS)
16.1.2 Robotics & Autonomous Vehicles
16.1.3 Computer Vision & Image Processing
16.1.4 Data Clustering & Machine Learning
16.1.5 Network Optimization
16.1.6 Computational Biology
16.1.7 Finance & Fraud Detection

17. Open-Source Implementations

17.1 Notable Open-Source Implementations

17.2 Example: Using Scipy's KDTree for Closest Points


from scipy.spatial import KDTree

points = [(2, 3), (12, 30), (40, 50), (5, 1), (12, 10), (3, 4)]
tree = KDTree(points)
distance, index = tree.query((2, 3), k=2)  # Find closest neighbor to (2,3)
print("Closest Pair:", points[index[1]], "Distance:", distance[1])

18. Project: Finding the Nearest Charging Station

18.1 Project Description

This script finds the closest electric vehicle (EV) charging station based on a user’s GPS coordinates. The dataset consists of multiple charging stations with their latitude and longitude.

18.2 Python Script for Finding the Nearest Charging Station


import math

# Function to calculate Euclidean distance based on GPS coordinates
def haversine_distance(coord1, coord2):
    lat1, lon1 = coord1
    lat2, lon2 = coord2
    R = 6371  # Earth's radius in km
    phi1, phi2 = math.radians(lat1), math.radians(lat2)
    dphi = math.radians(lat2 - lat1)
    dlambda = math.radians(lon2 - lon1)
    a = math.sin(dphi/2)**2 + math.cos(phi1) * math.cos(phi2) * math.sin(dlambda/2)**2
    return 2 * R * math.asin(math.sqrt(a))

# Sample charging stations (Latitude, Longitude)
charging_stations = [
    (28.6139, 77.2090),  # New Delhi
    (12.9716, 77.5946),  # Bangalore
    (19.0760, 72.8777),  # Mumbai
    (13.0827, 80.2707),  # Chennai
]

# User's current location
user_location = (19.2183, 72.9781)  # Near Mumbai

# Find the nearest charging station
min_distance = float('inf')
nearest_station = None

for station in charging_stations:
    distance = haversine_distance(user_location, station)
    if distance < min_distance:
        min_distance = distance
        nearest_station = station

print(f"Nearest Charging Station: {nearest_station} at {min_distance:.2f} km")

18.3 Expected Output

Nearest Charging Station: (19.0760, 72.8777) at 11.36 km

18.4 Enhancements

19. Competitive Programming & System Design Integration

19.1 Competitive Programming Applications

The Closest Pair of Points algorithm frequently appears in competitive programming contests. Key problem types include:

19.2 Integration in System Design

The Closest Pair of Points algorithm is essential in system design for:

Example: In a ride-sharing system, we can use a KD-Tree to efficiently find the closest driver to a customer in \( O(\log n) \) time instead of brute-force checking all available drivers.

20. Assignments

20.1 Solve At Least 10 Problems Using This Algorithm

  1. Find the closest pair of points in a 2D plane (basic implementation).
  2. Extend the problem to 3D space.
  3. Find the closest pair among 10⁶+ points efficiently.
  4. Modify the algorithm to return the k closest pairs instead of just one.
  5. Handle real-time updates when points are added dynamically.
  6. Implement the closest pair problem using KD-Trees.
  7. Find the closest pair of words in a large text document (word distance problem).
  8. Find the closest ATM or gas station from a given location.
  9. Determine the closest two delivery centers in a logistics system.
  10. Optimize a clustering algorithm using the closest pair approach.

20.2 Use It in a System Design Problem

Design a scalable ride-sharing system where:

Things to consider:

20.3 Implement It Under Time Constraints

Time-restricted coding challenge:

Practicing under time constraints helps in competitive programming and interviews where efficiency and correctness matter.