Radix Sort - CSU083 | Shoolini University

Radix Sort

1. Prerequisites

Before understanding Radix Sort, you must be familiar with:

2. What is Radix Sort?

Radix Sort is a non-comparative, integer-based sorting algorithm that sorts numbers digit by digit, from the least significant digit (LSD) to the most significant digit (MSD) or vice versa. It leverages Counting Sort as a stable sorting technique for each digit.

2.1 How Radix Sort Works?

Radix Sort follows these steps:

def counting_sort(arr, exp):
    n = len(arr)
    output = [0] * n
    count = [0] * 10  

    for i in arr:
        index = (i // exp) % 10
        count[index] += 1

    for i in range(1, 10):
        count[i] += count[i - 1]

    for i in reversed(range(n)):
        index = (arr[i] // exp) % 10
        output[count[index] - 1] = arr[i]
        count[index] -= 1

    for i in range(n):
        arr[i] = output[i]

def radix_sort(arr):
    max_val = max(arr)
    exp = 1
    while max_val // exp > 0:
        counting_sort(arr, exp)
        exp *= 10

arr = [170, 45, 75, 90, 802, 24, 2, 66]
radix_sort(arr)
print(arr)  # Output: [2, 24, 45, 66, 75, 90, 170, 802]

3. Why Does This Algorithm Exist?

Radix Sort was designed for sorting large datasets efficiently when elements are fixed-length numbers or strings. It eliminates the comparison overhead seen in algorithms like QuickSort and MergeSort.

Real-world applications include:

4. When Should You Use It?

Radix Sort is ideal when:

5. How Does It Compare to Alternatives?

Algorithm Time Complexity Space Complexity Stable Best Use Case
Radix Sort O(nk) (where k is the digit count) O(n + k) Yes Sorting integers or strings with fixed-length keys
QuickSort O(n log n) (average), O(n²) (worst) O(log n) (in-place) No General-purpose sorting
MergeSort O(n log n) O(n) Yes Stable sorting with linked lists
HeapSort O(n log n) O(1) No Sorting in-place with priority queues

Strengths of Radix Sort:

Weaknesses of Radix Sort:

6. Basic Implementation

The following is a basic implementation of Radix Sort in Python using Counting Sort as a subroutine.

def counting_sort(arr, exp):
    n = len(arr)
    output = [0] * n
    count = [0] * 10  # Count array for digits (0-9)

    # Count occurrences of each digit in current place value
    for i in arr:
        index = (i // exp) % 10
        count[index] += 1

    # Update count[i] to store the position of the next occurrence of digit i
    for i in range(1, 10):
        count[i] += count[i - 1]

    # Build the output array by placing elements in their correct positions
    for i in reversed(range(n)):
        index = (arr[i] // exp) % 10
        output[count[index] - 1] = arr[i]
        count[index] -= 1

    # Copy sorted values back to original array
    for i in range(n):
        arr[i] = output[i]

def radix_sort(arr):
    max_val = max(arr)
    exp = 1
    while max_val // exp > 0:
        counting_sort(arr, exp)
        exp *= 10

# Example Usage
arr = [170, 45, 75, 90, 802, 24, 2, 66]
radix_sort(arr)
print(arr)  # Output: [2, 24, 45, 66, 75, 90, 170, 802]

7. Dry Run of Radix Sort

Let's dry run the algorithm step by step on a small input: [170, 45, 75, 90].

Step 1: Find the Maximum Value

The maximum value is 170, which has 3 digits. Thus, we sort based on three place values: units, tens, and hundreds.

Step 2: Sort by the Least Significant Digit (Units Place)

Updated Array after Units Place Sorting: [170, 90, 45, 75]

Step 3: Sort by the Tens Place

Updated Array after Tens Place Sorting: [45, 170, 75, 90]

Step 4: Sort by the Hundreds Place

Final Sorted Array: [45, 75, 90, 170]

Step-by-Step Tracking of Variables

Pass Digit Processed Intermediate Array
1 Units (0, 5, 5, 0) [170, 90, 45, 75]
2 Tens (7, 9, 4, 7) [45, 170, 75, 90]
3 Hundreds (0, 1, 0, 0) [45, 75, 90, 170]

8. Time & Space Complexity Analysis

Radix Sort processes each digit of the numbers separately, making it a non-comparative sorting algorithm.

8.1 Time Complexity Analysis

The algorithm performs Counting Sort d times, and Counting Sort runs in O(n + k). Since k is a constant (10 in base 10), the overall complexity simplifies to:

$$ O(d \cdot (n + k)) \approx O(dn) $$

Worst-Case Complexity:

The worst case occurs when the maximum number has the highest number of digits. If d = logb(max number), the worst-case complexity is:

$$ O(n \cdot \log_b(M)) $$

For decimal numbers (base 10), this simplifies to:

$$ O(n \cdot \log_{10} M) $$

Best-Case Complexity:

Even in the best case (already sorted input), Radix Sort still processes all digits, so:

$$ O(n \cdot d) $$

Average-Case Complexity:

Since the algorithm always processes d digits for n elements, the average-case complexity remains:

$$ O(n \cdot d) $$

9. Space Complexity Analysis

Radix Sort requires extra space for:

Total space complexity:

$$ O(n + k) \approx O(n) $$

How Space Consumption Changes with Input Size?

10. Trade-offs in Radix Sort

10.1 Strengths

10.2 Weaknesses

10.3 When NOT to Use Radix Sort

Conclusion: Radix Sort is best suited for sorting large numbers with a known, limited range of digits, but its space requirements make it less suitable for memory-constrained environments.

11. Optimizations & Variants

Radix Sort can be optimized for performance and adapted into different versions to handle various use cases efficiently.

11.1 Common Optimizations

11.2 Variants of Radix Sort

Radix Sort can be implemented in two main ways:

11.2.1 Least Significant Digit (LSD) Radix Sort

Sorts from the least significant digit (rightmost) to the most significant digit (leftmost).

11.2.2 Most Significant Digit (MSD) Radix Sort

Sorts from the most significant digit (leftmost) to the least significant digit (rightmost).

11.2.3 Hybrid Radix Sort

Combines LSD and MSD approaches, switching between them based on dataset characteristics. Hybrid versions often include:

12. Iterative vs. Recursive Implementations

12.1 Iterative Implementation (LSD Radix Sort)

The iterative approach processes one digit at a time using a loop.

def radix_sort_iterative(arr):
    max_val = max(arr)
    exp = 1
    while max_val // exp > 0:
        counting_sort(arr, exp)
        exp *= 10

12.2 Recursive Implementation (MSD Radix Sort)

The recursive approach sorts elements based on the most significant digit first and then recursively sorts partitions.

def radix_sort_recursive(arr, digit_pos):
    if len(arr) <= 1 or digit_pos < 0:
        return arr

    buckets = [[] for _ in range(10)]
    for num in arr:
        digit = (num // (10 ** digit_pos)) % 10
        buckets[digit].append(num)

    sorted_arr = []
    for bucket in buckets:
        sorted_arr.extend(radix_sort_recursive(bucket, digit_pos - 1))

    return sorted_arr

arr = [170, 45, 75, 90, 802, 24, 2, 66]
sorted_arr = radix_sort_recursive(arr, len(str(max(arr))) - 1)
print(sorted_arr)  # Output: [2, 24, 45, 66, 75, 90, 170, 802]

12.3 Comparison of Iterative vs. Recursive Radix Sort

Approach Time Complexity Space Complexity Pros Cons
Iterative (LSD) O(n * d) O(n + k) Simple implementation, efficient for fixed-length numbers Less flexible for variable-length keys
Recursive (MSD) O(n * d) O(n + k) Efficient for variable-length keys, good for lexicographic sorting More memory-intensive, harder to implement

Final Verdict:

13. Edge Cases & Failure Handling

Radix Sort, like any algorithm, can fail in certain edge cases. Below are some common pitfalls and how to handle them.

13.1 Edge Cases

13.2 Failure Handling

14. Test Cases for Verification

To verify the correctness of Radix Sort, run test cases covering all possible scenarios.

def test_radix_sort():
    test_cases = [
        # Edge cases
        ([], []),  # Empty array
        ([5], [5]),  # Single element
        ([3, 3, 3], [3, 3, 3]),  # All elements the same
        ([1, 2, 3, 4, 5], [1, 2, 3, 4, 5]),  # Already sorted
        ([5, 4, 3, 2, 1], [1, 2, 3, 4, 5]),  # Reverse sorted

        # General cases
        ([170, 45, 75, 90, 802, 24, 2, 66], [2, 24, 45, 66, 75, 90, 170, 802]),
        ([999, 123, 456, 789, 0, 321], [0, 123, 321, 456, 789, 999]),

        # Handling negative numbers
        ([-5, -10, -3, -1, -50], [-50, -10, -5, -3, -1]),

        # Mixed positive and negative numbers
        ([10, -2, 0, 5, -7], [-7, -2, 0, 5, 10]),

        # Floating-point numbers (converted to integers)
        ([3.14, 2.71, 1.41, 4.56], [1.41, 2.71, 3.14, 4.56])
    ]

    for input_arr, expected in test_cases:
        arr = input_arr.copy()
        radix_sort(arr)
        assert arr == expected, f"Failed for input {input_arr}"

    print("All test cases passed!")

# Run tests
test_radix_sort()

15. Real-World Failure Scenarios

Understanding real-world failures can help improve Radix Sort implementations.

15.1 Memory Overflows

15.2 Sorting Floating-Point Numbers

15.3 Handling Negative Numbers

15.4 Variable-Length Numbers (MSD Sort Issue)

15.5 Performance Bottlenecks

Final Takeaway: While Radix Sort is efficient for integers and fixed-length keys, it requires modifications to handle floating-point numbers, negatives, and large datasets efficiently.

16. Real-World Applications & Industry Use Cases

Radix Sort is widely used in applications where non-comparative sorting provides a performance advantage, particularly in numerical and string-based datasets.

16.1 Computer Science & Databases

16.2 Networking & Telecommunications

16.3 Graphics & Computer Vision

16.4 Financial & Economic Data Processing

16.5 Scientific Computing & AI

17. Open-Source Implementations

Several open-source projects use optimized implementations of Radix Sort. Below are some notable examples:

17.1 Python Implementations

17.2 C++ & Java Implementations

17.3 GPU Implementations

18. Practical Project: Sorting Log Files by Timestamps

Log files often contain timestamps in a sortable format (e.g., YYYYMMDDHHMMSS). Radix Sort can efficiently sort them.

18.1 Problem Statement

Given a list of log entries, sort them based on timestamps efficiently using Radix Sort.

18.2 Implementation in Python

import re

def extract_timestamp(log):
    """Extract timestamp from log entry and convert to integer for sorting."""
    match = re.search(r"\d{14}", log)  # YYYYMMDDHHMMSS
    return int(match.group()) if match else 0

def radix_sort_logs(logs):
    """Sort logs based on extracted timestamps using Radix Sort."""
    timestamps = [extract_timestamp(log) for log in logs]
    max_val = max(timestamps, default=0)
    exp = 1
    while max_val // exp > 0:
        counting_sort_logs(logs, timestamps, exp)
        exp *= 10
    return logs

def counting_sort_logs(logs, timestamps, exp):
    """Perform Counting Sort on logs based on the timestamp's current digit."""
    n = len(logs)
    output_logs = [None] * n
    output_timestamps = [0] * n
    count = [0] * 10  

    for timestamp in timestamps:
        index = (timestamp // exp) % 10
        count[index] += 1

    for i in range(1, 10):
        count[i] += count[i - 1]

    for i in reversed(range(n)):
        index = (timestamps[i] // exp) % 10
        output_logs[count[index] - 1] = logs[i]
        output_timestamps[count[index] - 1] = timestamps[i]
        count[index] -= 1

    for i in range(n):
        logs[i] = output_logs[i]
        timestamps[i] = output_timestamps[i]

# Example log entries with timestamps (YYYYMMDDHHMMSS)
logs = [
    "20240220123045 ERROR: System crashed",
    "20240219153010 INFO: Process started",
    "20240221094530 WARNING: High memory usage",
    "20240219153009 INFO: Process initialized",
]

sorted_logs = radix_sort_logs(logs)
for log in sorted_logs:
    print(log)

18.3 Explanation

18.4 Expected Output

Sorted logs based on timestamps:


20240219153009 INFO: Process initialized
20240219153010 INFO: Process started
20240220123045 ERROR: System crashed
20240221094530 WARNING: High memory usage

Conclusion: This project demonstrates a real-world scenario where Radix Sort efficiently processes structured data like log files.

19. Competitive Programming & System Design Integration

19.1 Competitive Programming Perspective

Radix Sort is useful in competitive programming when sorting large numbers or fixed-length strings efficiently.

19.1.1 Why Radix Sort in Contests?
19.1.2 When to Use Radix Sort in CP?

19.2 System Design Integration

Radix Sort is used in system design scenarios that require efficient sorting of structured data.

19.2.1 Where is Radix Sort Used in System Design?
19.2.2 Example: Using Radix Sort in a Log Processing System

Consider a system where logs are stored in distributed nodes. The requirement is to sort logs in real-time based on timestamps.

System Components:

20. Assignments

20.1 Solve at least 10 Problems using Radix Sort

Practice sorting problems to reinforce understanding.

  1. Sort Integers by the Number of 1 Bits (LeetCode)
  2. Sorting Large Numbers (SPOJ)
  3. Radix Sort-based Sorting (Codeforces)
  4. Sort floating-point numbers using Radix Sort.
  5. Sort IPv4 addresses efficiently using Radix Sort.
  6. Sort a list of phone numbers lexicographically.
  7. Implement Radix Sort for hexadecimal numbers.
  8. Use Radix Sort to sort filenames in a directory.
  9. Sort a large dataset of timestamps efficiently.
  10. Parallelize Radix Sort using multi-threading.

20.2 Use Radix Sort in a System Design Problem

Design a large-scale distributed sorting service using Radix Sort.

20.3 Implement Radix Sort Under Time Constraints

Simulate a coding contest scenario:

Challenge: Try implementing an in-place version of Radix Sort to reduce memory usage.