Bucket Sort - CSU083 | Shoolini University

Bucket Sort

1. Prerequisites for Bucket Sort

Before understanding Bucket Sort, you must be familiar with the following foundational concepts:

2. What is Bucket Sort?

Bucket Sort is a non-comparative sorting algorithm that distributes elements into multiple buckets and sorts each bucket individually, typically using another sorting algorithm.

It follows these steps:

  1. Divide the range of input values into several buckets.
  2. Place each element into its corresponding bucket.
  3. Sort each bucket individually (using another sorting algorithm like Insertion Sort).
  4. Concatenate all sorted buckets to get the final sorted array.

It is particularly useful when input data is uniformly distributed over a known range.

3. Why Does This Algorithm Exist?

Bucket Sort is designed to efficiently sort large datasets that are uniformly distributed, reducing the reliance on comparison-based sorting algorithms. It is used when:

4. When Should You Use Bucket Sort?

Bucket Sort is effective under the following conditions:

Avoid using Bucket Sort when:

5. Comparison with Other Sorting Algorithms

Algorithm Best Case Worst Case Space Complexity Comparison Type Best Use Case
Bucket Sort O(n + k) O(n²) (if all elements land in one bucket) O(n + k) Non-comparative Uniformly distributed floating-point numbers
Quick Sort O(n log n) O(n²) (worst pivot selection) O(log n) Comparative General-purpose sorting
Merge Sort O(n log n) O(n log n) O(n) Comparative Stable sorting with large datasets
Counting Sort O(n + k) O(n + k) O(k) Non-comparative Small range of integers

6. Basic Implementation of Bucket Sort

Below is a Python implementation of Bucket Sort:

def bucket_sort(arr):
    if len(arr) == 0:
        return arr

    # Step 1: Find the minimum and maximum values
    min_val, max_val = min(arr), max(arr)

    # Step 2: Create empty buckets
    bucket_count = len(arr)
    buckets = [[] for _ in range(bucket_count)]

    # Step 3: Place elements into their respective buckets
    for num in arr:
        index = int((num - min_val) / (max_val - min_val + 1) * bucket_count)
        buckets[index].append(num)

    # Step 4: Sort each bucket
    for bucket in buckets:
        bucket.sort()

    # Step 5: Concatenate sorted buckets
    sorted_array = []
    for bucket in buckets:
        sorted_array.extend(bucket)

    return sorted_array

# Example input
arr = [0.78, 0.17, 0.39, 0.26, 0.72, 0.94, 0.21, 0.12, 0.23, 0.68]
sorted_arr = bucket_sort(arr)
print(sorted_arr)

7. Dry Run of Bucket Sort

We will dry run the algorithm on the input array:

arr = [0.78, 0.17, 0.39, 0.26, 0.72, 0.94, 0.21, 0.12, 0.23, 0.68]

### Step 1: Find the min and max values

### Step 2: Initialize empty buckets

buckets = [[], [], [], [], [], [], [], [], [], []]

### Step 3: Place elements in corresponding buckets

After distributing, buckets look like:

buckets = [
  [0.17, 0.12],  # Bucket 0
  [0.26, 0.21, 0.23],  # Bucket 1
  [],  # Bucket 2
  [0.39],  # Bucket 3
  [], [],  # Buckets 4, 5
  [0.68],  # Bucket 6
  [0.72],  # Bucket 7
  [0.78],  # Bucket 8
  [0.94]   # Bucket 9
]

### Step 4: Sort individual buckets

### Step 5: Concatenate sorted buckets

sorted_array = [0.12, 0.17, 0.21, 0.23, 0.26, 0.39, 0.68, 0.72, 0.78, 0.94]

Final sorted output: [0.12, 0.17, 0.21, 0.23, 0.26, 0.39, 0.68, 0.72, 0.78, 0.94]

8. Time & Space Complexity Analysis

8.1 Worst-Case Complexity

In the worst case, all elements may end up in the same bucket. If we use an inefficient sorting algorithm like Insertion Sort inside each bucket, the sorting step can take:

$$O(n^2)$$

Therefore, the worst-case time complexity is:

O(n²) (if all elements land in a single bucket and require comparison-based sorting).

8.2 Best-Case Complexity

The best case occurs when elements are uniformly distributed across buckets and each bucket has at most one element. This results in:

So, the best-case time complexity is:

O(n + k) (where k is the number of buckets, and sorting within them is trivial).

8.3 Average-Case Complexity

Assuming an ideal distribution where elements are uniformly spread across k buckets, the time complexity for sorting each bucket is nearly constant. Using an efficient sorting algorithm inside buckets (like Insertion Sort for small buckets), the average complexity is:

O(n + k + n log(n/k))

If k ≈ n (number of elements), the expression simplifies to:

O(n log n)

9. Space Complexity Analysis

Bucket Sort requires additional space for storing buckets, which depends on the input size.

9.1 Space Consumption Breakdown

Total space complexity: O(n + k).

9.2 Space Growth with Input Size

If the number of buckets k is proportional to n, then space complexity remains O(n). However, if we allocate an excessive number of buckets, space overhead increases significantly.

10. Trade-offs of Bucket Sort

10.1 Strengths

10.2 Weaknesses

10.3 When to Avoid Bucket Sort

11. Optimizations & Variants

11.1 Common Optimizations

Several optimizations improve the efficiency of Bucket Sort:

11.2 Variants of Bucket Sort

Different versions of Bucket Sort adapt to specific scenarios:

12. Iterative vs. Recursive Implementations

12.1 Iterative Implementation

The traditional iterative approach follows a straightforward sequence:

def bucket_sort_iterative(arr):
    if len(arr) == 0:
        return arr

    min_val, max_val = min(arr), max(arr)
    bucket_count = len(arr)
    buckets = [[] for _ in range(bucket_count)]

    for num in arr:
        index = int((num - min_val) / (max_val - min_val + 1) * bucket_count)
        buckets[index].append(num)

    for bucket in buckets:
        bucket.sort()

    sorted_array = []
    for bucket in buckets:
        sorted_array.extend(bucket)

    return sorted_array

Efficiency:

12.2 Recursive Implementation

A recursive approach applies Bucket Sort within each bucket, creating a multi-level sorting structure:

def bucket_sort_recursive(arr, depth=0):
    if len(arr) <= 1 or depth > 5:  # Prevent excessive recursion
        return arr

    min_val, max_val = min(arr), max(arr)
    bucket_count = len(arr)
    buckets = [[] for _ in range(bucket_count)]

    for num in arr:
        index = int((num - min_val) / (max_val - min_val + 1) * bucket_count)
        buckets[index].append(num)

    sorted_array = []
    for bucket in buckets:
        sorted_array.extend(bucket_sort_recursive(bucket, depth + 1))

    return sorted_array

Efficiency:

12.3 Comparison of Iterative vs. Recursive Approach

Criteria Iterative Recursive
Time Complexity O(n log n) (with optimized bucket sorting) O(n log n) (with recursive refinement)
Space Complexity O(n + k) (extra space for buckets) O(n + k + d) (d = recursion depth)
Best Use Case General-purpose sorting Multi-level nested data distributions
Limitations Less adaptive for deep bucket refinement Excessive recursion may cause stack overflow

Conclusion: The iterative approach is preferred for most applications due to its controlled memory usage. The recursive approach is useful for hierarchical data distributions but requires careful depth management.

13. Edge Cases & Failure Handling

13.1 Common Pitfalls and Edge Cases

While Bucket Sort is efficient in many cases, it has several edge cases that require careful handling:

14. Test Cases to Verify Correctness

To ensure correctness, we define test cases for various scenarios:

import unittest

def bucket_sort(arr):
    if len(arr) == 0:
        return arr

    min_val, max_val = min(arr), max(arr)
    bucket_count = len(arr)
    buckets = [[] for _ in range(bucket_count)]

    for num in arr:
        index = int((num - min_val) / (max_val - min_val + 1) * bucket_count)
        buckets[index].append(num)

    for bucket in buckets:
        bucket.sort()

    sorted_array = []
    for bucket in buckets:
        sorted_array.extend(bucket)

    return sorted_array

class TestBucketSort(unittest.TestCase):

    def test_empty_list(self):
        self.assertEqual(bucket_sort([]), [])

    def test_single_element(self):
        self.assertEqual(bucket_sort([42]), [42])

    def test_sorted_list(self):
        self.assertEqual(bucket_sort([1, 2, 3, 4, 5]), [1, 2, 3, 4, 5])

    def test_reverse_sorted_list(self):
        self.assertEqual(bucket_sort([5, 4, 3, 2, 1]), [1, 2, 3, 4, 5])

    def test_duplicates(self):
        self.assertEqual(bucket_sort([3, 1, 2, 1, 3, 2]), [1, 1, 2, 2, 3, 3])

    def test_floating_points(self):
        self.assertEqual(bucket_sort([0.42, 0.32, 0.23, 0.52, 0.67]), [0.23, 0.32, 0.42, 0.52, 0.67])

    def test_large_numbers(self):
        self.assertEqual(bucket_sort([1000, 500, 100, 50, 10]), [10, 50, 100, 500, 1000])

    def test_negative_numbers(self):
        self.assertEqual(bucket_sort([-3, -1, -2, -4, -5]), [-5, -4, -3, -2, -1])

if __name__ == '__main__':
    unittest.main()

### Expected Output

15. Real-World Failure Scenarios

15.1 When Bucket Sort Fails in Practice

Despite its efficiency, Bucket Sort has real-world failure scenarios:

15.2 Preventing Failures

To prevent these issues, consider:

Understanding these pitfalls helps in making an informed choice about when to use Bucket Sort.

16. Real-World Applications & Industry Use Cases

16.1 When is Bucket Sort Used?

Bucket Sort is used in scenarios where data is uniformly distributed and can be efficiently divided into smaller groups.

16.2 Real-World Applications

17. Open-Source Implementations of Bucket Sort

17.1 Notable Open-Source Repositories

17.2 Exploring a Python-Based Open-Source Implementation

A popular open-source repository implementing Bucket Sort in Python:

git clone https://github.com/TheAlgorithms/Python
cd Python/sorts
cat bucket_sort.py

This repository contains well-optimized versions with different bucket assignment strategies.

18. Practical Project Using Bucket Sort

18.1 Project: Sorting Real-World Floating Point Data

We will create a script that sorts temperature readings from different locations using Bucket Sort.

18.2 Python Implementation

import random

def bucket_sort(arr):
    if len(arr) == 0:
        return arr

    min_val, max_val = min(arr), max(arr)
    bucket_count = len(arr)
    buckets = [[] for _ in range(bucket_count)]

    for num in arr:
        index = int((num - min_val) / (max_val - min_val + 1) * bucket_count)
        buckets[index].append(num)

    for bucket in buckets:
        bucket.sort()

    sorted_array = []
    for bucket in buckets:
        sorted_array.extend(bucket)

    return sorted_array

# Generate random temperature readings from various cities
temperature_readings = [round(random.uniform(-10, 40), 2) for _ in range(20)]
print("Unsorted Temperatures:", temperature_readings)

# Apply Bucket Sort
sorted_temperatures = bucket_sort(temperature_readings)
print("Sorted Temperatures:", sorted_temperatures)

18.3 Explanation

18.4 Further Enhancements

This project demonstrates a real-world application of Bucket Sort in handling temperature datasets.

19. Competitive Programming & System Design Integration

19.1 Competitive Programming Usage

Bucket Sort is rarely used in general coding competitions due to its dependence on uniform distribution and additional space requirements. However, it is effective in specific problems where:

19.2 Common Competitive Programming Problems

19.3 System Design Integration

Bucket Sort is useful in distributed computing and database indexing:

20. Assignments for Mastery

20.1 Solve at least 10 problems using Bucket Sort

Solve the following problems using Bucket Sort:

  1. Sort decimal numbers between 0 and 1.
  2. Sort student GPAs in a university database.
  3. Sort a list of floating-point temperature readings.
  4. Rank players in an online gaming leaderboard.
  5. Sort large real-number datasets efficiently.
  6. Distribute traffic load across multiple web servers.
  7. Sort a large dataset of city population numbers.
  8. Implement parallel bucket sorting.
  9. Use bucket sort for external sorting on large files.
  10. Sort product prices efficiently in an e-commerce website.

20.2 System Design Problem

Design a distributed sorting system for handling millions of product prices in an e-commerce application:

20.3 Practice Implementing Bucket Sort Under Time Constraints

Set a timer and implement Bucket Sort within: