Counting Sort

1. Prerequisites

Before understanding Counting Sort, you should be familiar with the following concepts:

Arrays: Understanding how elements are stored and accessed.
Time Complexity: Basic knowledge of Big-O notation.
Sorting Basics: Understanding what sorting means and why it is needed.
Range of Data: Concept of data range and why it affects sorting performance.

2. What is Counting Sort?

Counting Sort is a non-comparative integer sorting algorithm that sorts elements by counting occurrences of each unique value within a range and using this information to place elements in order.

2.1 Working Mechanism

Determine the range of input values.
Count occurrences of each value.
Transform the count array to determine positions.
Place elements in their correct positions using a temporary array.

2.1.1 Implementation


def counting_sort(arr):
    max_val = max(arr)
    count = [0] * (max_val + 1)

    for num in arr:
        count[num] += 1

    sorted_arr = []
    for i in range(len(count)):
        sorted_arr.extend([i] * count[i])

    return sorted_arr

arr = [4, 2, 2, 8, 3, 3, 1]
print(counting_sort(arr))

3. Why Does Counting Sort Exist?

Sorting large datasets efficiently is crucial in many real-world applications. Counting Sort was developed to handle cases where:

Data values are within a small range: Avoiding unnecessary comparisons.
Sorting must be stable: Ensuring equal values retain their relative order.
Need for faster sorting: Achieves O(n) time complexity, unlike O(n log n) comparison-based sorts.

3.1 Real-World Applications

Sorting test scores: When scores range from 0 to 100.
Counting frequency of events: Useful in histogram-based analysis.
Data pre-processing: Optimizing radix sort when used as a subroutine.

4. When Should You Use Counting Sort?

When data is within a small, fixed range.
When stability is required in sorting.
When you need linear time sorting, and memory constraints are not a major concern.

Avoid using Counting Sort if:

Data range is very large (e.g., millions of unique values).
Sorting floating-point numbers or complex objects.
Memory usage is a concern, as extra space is needed.

5. How Does It Compare to Alternatives?

Sorting Algorithm	Time Complexity	Space Complexity	Comparison-Based	Stable	Best Use Case
Counting Sort	O(n + k)	O(k)	No	Yes	Small integer ranges
QuickSort	O(n log n)	O(log n)	Yes	No	General-purpose sorting
MergeSort	O(n log n)	O(n)	Yes	Yes	Linked lists, large datasets
Radix Sort	O(nk)	O(n + k)	No	Yes	Sorting large numbers

6. Basic Implementation

6.1 Basic Implementation in Python

The following Python code implements Counting Sort:


def counting_sort(arr):
    # Step 1: Find the maximum value in the array
    max_val = max(arr)
    
    # Step 2: Create a count array of size (max_val + 1) initialized to zero
    count = [0] * (max_val + 1)

    # Step 3: Count occurrences of each element
    for num in arr:
        count[num] += 1

    # Step 4: Transform count array to store actual positions
    for i in range(1, len(count)):
        count[i] += count[i - 1]

    # Step 5: Build the sorted output array
    output = [0] * len(arr)
    for num in reversed(arr):  # Ensuring stability
        output[count[num] - 1] = num
        count[num] -= 1

    return output

# Example usage
arr = [4, 2, 2, 8, 3, 3, 1]
sorted_arr = counting_sort(arr)
print("Sorted array:", sorted_arr)

7. Dry Run of Counting Sort

We will manually track how variables change during execution for the input:


arr = [4, 2, 2, 8, 3, 3, 1]

7.1 Step-by-Step Execution

Step	Operation	count[]	output[]
1	Find max value (max_val = 8)	-	-
2	Initialize count array (size 9)	[0, 0, 0, 0, 0, 0, 0, 0, 0]	-
3	Count occurrences	[0, 1, 2, 2, 1, 0, 0, 0, 1]	-
4	Convert count to cumulative positions	[0, 1, 3, 5, 6, 6, 6, 6, 7]	-
5	Place elements into output array (traverse in reverse for stability)	Updating...	[1, 2, 2, 3, 3, 4, 8]

7.2 Final Output

After all steps, the sorted array is:


[1, 2, 2, 3, 3, 4, 8]

8. Time & Space Complexity Analysis

8.1 Time Complexity Analysis

Counting Sort operates in three main steps:

Counting occurrences: Iterates through the input array → O(n)
Building cumulative sum: Iterates through the count array (size k) → O(k)
Sorting elements: Iterates through the input array again → O(n)

Thus, the overall time complexity is:

$$ O(n + k) $$

8.1.1 Worst-Case Complexity

Occurs when the maximum value (k) is large relative to n.

$$ O(n + k) $$

8.1.2 Best-Case Complexity

When elements are already sorted, it still needs to count and place elements.

$$ O(n + k) $$

8.1.3 Average-Case Complexity

Since Counting Sort does not rely on element comparisons, its average-case remains:

$$ O(n + k) $$

9. Space Complexity Analysis

9.1 Space Consumption Breakdown

Input array: O(n)
Count array: O(k)
Output array: O(n)

Total space complexity:

$$ O(n + k) $$

9.2 Space Growth with Input Size

If k (range of values) is small, space is close to O(n).
If k is large, space grows significantly.

Counting Sort is not in-place since it requires additional arrays.

10. Trade-offs of Counting Sort

10.1 Strengths

Linear Time Complexity: O(n + k) is better than O(n log n) for small k.
Stable Sorting: Preserves the order of duplicate values.
Efficient for Small Ranges: Works well when k is not significantly larger than n.

10.2 Weaknesses

High Space Usage: Uses extra space proportional to k.
Limited Applicability: Only works for integers within a known range.
Not In-Place: Requires additional storage.

10.3 When to Use and When to Avoid

Use Counting Sort When	Avoid Counting Sort When
Sorting integers with a small range (e.g., 0-1000).	Sorting large numbers or floating-point values.
When stability is important.	When memory is a constraint.
Preprocessing step for Radix Sort.	Sorting a very large range of numbers.

11. Optimizations & Variants

11.1 Common Optimizations

Range Reduction: If values are within a specific range (e.g., from 100 to 200), adjust indices to minimize memory usage.
In-Place Modification: Modifying the input array directly can reduce auxiliary space, though it sacrifices stability.
Sparse Data Handling: Instead of allocating an array of size max value (k), use hash maps for non-continuous data.

11.2 Variants of Counting Sort

11.2.1 Stable Counting Sort

Ensures equal elements retain their original order.

Uses an extra output array.
Processes elements in reverse order.

11.2.2 In-Place Counting Sort

Avoids extra memory by modifying the input array directly, but is unstable.

11.2.3 Bucketed Counting Sort

Used when input is spread unevenly across a large range:

Divides input into smaller buckets.
Each bucket is sorted separately.
Useful when input has gaps.

11.2.4 Counting Sort as a Subroutine in Radix Sort

Since Counting Sort works well on fixed-length integer ranges, it is used in Radix Sort to sort digits place by place.

12. Iterative vs. Recursive Counting Sort

12.1 Iterative Implementation

Iterative Counting Sort follows a step-by-step loop-based approach:


def counting_sort(arr):
    max_val = max(arr)
    count = [0] * (max_val + 1)

    for num in arr:
        count[num] += 1

    output = []
    for i in range(len(count)):
        output.extend([i] * count[i])

    return output

12.2 Recursive Implementation

A recursive approach can be implemented by recursively processing smaller subarrays:


def counting_sort_recursive(arr, count=None, index=0):
    if count is None:
        max_val = max(arr)
        count = [0] * (max_val + 1)
        for num in arr:
            count[num] += 1

    if index >= len(count):
        return []

    return [index] * count[index] + counting_sort_recursive(arr, count, index + 1)

arr = [4, 2, 2, 8, 3, 3, 1]
print(counting_sort_recursive(arr))

12.3 Efficiency Comparison

Approach	Time Complexity	Space Complexity	Pros	Cons
Iterative	O(n + k)	O(k)	Efficient, straightforward.	Requires explicit loops.
Recursive	O(n + k)	O(k) + O(k) (recursion stack)	More readable.	Consumes extra memory for recursive calls.

Conclusion: The iterative version is preferred due to its lower memory overhead.

13. Edge Cases & Failure Handling

13.1 Common Edge Cases

Empty Array: An empty input should return an empty output.
Single Element: If the array has only one element, it should return the same array.
All Identical Elements: The algorithm should return the same array.
Negative Numbers: Counting Sort does not naturally support negative numbers since it relies on array indices. A transformation is needed.
Large Range of Values: If the max value is much larger than n, memory usage becomes inefficient.
Floating Point or String Inputs: Counting Sort is designed for integers, so it fails when sorting floats or non-integer data.

13.2 Handling Failures

For negative numbers, shift values by adding the absolute minimum to make them non-negative.
For floating point numbers, use an alternative sorting algorithm like Radix Sort for fixed precision decimals.
For large gaps in data, use a sparse data structure like a dictionary instead of an array.

14. Test Cases for Counting Sort

14.1 Basic Test Cases


def test_counting_sort():
    assert counting_sort([]) == [], "Failed: Empty array"
    assert counting_sort([5]) == [5], "Failed: Single element"
    assert counting_sort([1, 1, 1]) == [1, 1, 1], "Failed: All identical elements"
    assert counting_sort([4, 2, 2, 8, 3, 3, 1]) == [1, 2, 2, 3, 3, 4, 8], "Failed: Unsorted input"
    print("All test cases passed!")

# Run tests
test_counting_sort()

14.2 Edge Case Test Cases


def test_edge_cases():
    assert counting_sort([100, 99, 98]) == [98, 99, 100], "Failed: Reverse sorted"
    assert counting_sort([10, 20, 30, 40, 50]) == [10, 20, 30, 40, 50], "Failed: Already sorted"
    assert counting_sort([-5, -10, -3, -1]) == [-10, -5, -3, -1], "Failed: Negative numbers"
    assert counting_sort([1000000, 500000, 0]) == [0, 500000, 1000000], "Failed: Large numbers"
    print("All edge cases passed!")

# Run edge case tests
test_edge_cases()

15. Real-World Failure Scenarios

15.1 When Counting Sort Fails

Large Integer Ranges: Sorting ages (0-120) is fine, but sorting credit card numbers (16 digits) is inefficient.
Negative Numbers Without Preprocessing: Since array indices cannot be negative, the algorithm breaks.
Floating-Point Data: Counting Sort assumes discrete values, failing for continuous data like weights (e.g., 65.5 kg).
Large Sparse Data: If input has values like [1, 1000000, 5000000], using an array of size 5000000 is infeasible.

15.2 How to Handle Failures

Use Radix Sort for large integer values.
Apply offset transformation for negative numbers.
Use a bucket sort approach for floating points.
Opt for comparison-based sorting algorithms like Merge Sort for large, sparse, or unknown ranges.

16. Real-World Applications & Industry Use Cases

16.1 Real-World Uses of Counting Sort

Sorting Exam Scores: Universities often sort student marks within a small range (0-100) efficiently.
Data Analysis & Histogram Counting: Counting frequency of occurrences in datasets for statistical analysis.
Election Vote Counting: Used to tally votes when candidates are limited.
DNS Packet Sorting: Sorting IP packet headers efficiently in network traffic analysis.
Gene Sequencing: Used in bioinformatics to sort DNA sequences when nucleotide values are limited.

16.2 Industry Use Cases

Cybersecurity: Analyzing packet data and sorting threat levels within a fixed range.
Game Development: Sorting player scores in a leaderboard when values are constrained.
Inventory Management: Counting and sorting warehouse stock based on SKU numbers.

17. Open-Source Implementations

17.1 Python Open-Source Implementation

Scipy provides a variation of Counting Sort for frequency-based sorting:


from scipy.stats import rankdata

arr = [4, 2, 2, 8, 3, 3, 1]
sorted_arr = rankdata(arr, method='dense')  # Dense ranking mimics counting sort
print(sorted_arr)

17.2 C++ Open-Source Implementation


#include <iostream>
#include <vector>

void countingSort(std::vector& arr) {
    int max_val = *max_element(arr.begin(), arr.end());
    std::vector count(max_val + 1, 0);
    
    for (int num : arr)
        count[num]++;
    
    int index = 0;
    for (int i = 0; i <= max_val; i++) {
        while (count[i]-- > 0) {
            arr[index++] = i;
        }
    }
}

int main() {
    std::vector arr = {4, 2, 2, 8, 3, 3, 1};
    countingSort(arr);
    
    for (int num : arr)
        std::cout << num << " ";
    return 0;
}

17.3 Java Open-Source Implementation


import java.util.Arrays;

public class CountingSort {
    public static void sort(int[] arr) {
        int max = Arrays.stream(arr).max().getAsInt();
        int[] count = new int[max + 1];

        for (int num : arr) count[num]++;
        
        int index = 0;
        for (int i = 0; i < count.length; i++)
            while (count[i]-- > 0)
                arr[index++] = i;
    }

    public static void main(String[] args) {
        int[] arr = {4, 2, 2, 8, 3, 3, 1};
        sort(arr);
        System.out.println(Arrays.toString(arr));
    }
}

18. Practical Script Using Counting Sort

18.1 Scenario: Sorting Student Scores & Assigning Grades

This script sorts student scores and assigns letter grades based on percentile ranks.


def counting_sort(arr):
    max_val = max(arr)
    count = [0] * (max_val + 1)

    for num in arr:
        count[num] += 1

    output = []
    for i in range(len(count)):
        output.extend([i] * count[i])

    return output

def assign_grades(scores):
    sorted_scores = counting_sort(scores)
    total = len(scores)
    
    grade_map = {
        "A+": 90, "A": 80, "B+": 70, "B": 60, 
        "C+": 50, "C": 40, "D": 30, "F": 0
    }
    
    percentile_ranks = {}
    for i, score in enumerate(sorted_scores):
        percentile = (i / total) * 100
        for grade, threshold in grade_map.items():
            if percentile >= threshold:
                percentile_ranks[score] = grade
                break

    return {score: percentile_ranks[score] for score in scores}

# Example Usage
scores = [85, 92, 75, 60, 45, 88, 70, 90, 50]
grades = assign_grades(scores)
print(grades)

18.2 How This Works

Step 1: Sorts scores using Counting Sort.
Step 2: Computes percentiles based on rank.
Step 3: Assigns grades based on percentile thresholds.
Step 4: Outputs a dictionary mapping scores to grades.

18.3 Example Output


{85: 'A', 92: 'A+', 75: 'B+', 60: 'B', 45: 'C+', 88: 'A', 70: 'B+', 90: 'A+', 50: 'C'}

19. Competitive Programming & System Design Integration

19.1 Counting Sort in Competitive Programming

Counting Sort is commonly used in competitive programming when:

Sorting constrained integer ranges: Problems with limited value ranges (e.g., 0 to 10⁶).
Frequency-based problems: Counting occurrences efficiently.
Handling large input sizes: Sorting in O(n) rather than O(n log n).

19.1.1 Example Problem: Frequency Sorting

Given an array, sort elements by frequency, breaking ties by value.


from collections import Counter

def frequency_sort(arr):
    freq = Counter(arr)
    max_val = max(arr)
    
    count = [0] * (max_val + 1)
    for num in arr:
        count[num] += 1

    return sorted(arr, key=lambda x: (-freq[x], x))

arr = [4, 2, 2, 8, 3, 3, 1]
print(frequency_sort(arr))  # Output: [2, 2, 3, 3, 1, 4, 8]

19.2 Counting Sort in System Design

While Counting Sort is primarily used in small-range sorting, it has system design applications:

Load Balancing: Sorting server response times for efficiency analysis.
Analytics & Data Processing: Counting occurrences of user behavior in logs.
Cache Optimization: Sorting cache accesses to improve retrieval efficiency.

19.2.1 Example: Log Processing

Suppose a system tracks HTTP response times. Counting Sort can be used to sort logs and identify slowest response categories.


def log_sort(response_times):
    max_time = max(response_times)
    count = [0] * (max_time + 1)

    for time in response_times:
        count[time] += 1

    sorted_times = []
    for i in range(len(count)):
        sorted_times.extend([i] * count[i])

    return sorted_times

logs = [200, 500, 200, 404, 500, 200, 302, 404]
print(log_sort(logs))  # Sorted response times

20. Assignments

20.1 Problem Solving Tasks

Practice solving the following problems using Counting Sort:

Sort an array of integers where the maximum value is ≤ 10⁶.
Sort elements by frequency (most frequent elements appear first).
Find the kth smallest element in an array using Counting Sort.
Sort an array containing only 0s, 1s, and 2s (Dutch National Flag problem).
Given a range of ages (0-120), sort them efficiently.
Sort a list of strings where each string has a known maximum length (apply Counting Sort per character).
Sort timestamps in a web server log file (assuming fixed time granularity).
Find duplicate numbers in a large dataset efficiently.
Group anagrams using Counting Sort on character frequencies.
Implement Counting Sort without using extra space (in-place version).

20.2 System Design Task

Design a leaderboard system that ranks users based on scores ranging from 0 to 1000.

Implement efficient insertion, sorting, and retrieval using Counting Sort.
Optimize memory usage when dealing with a large number of users.
Extend functionality to track top K users dynamically.

20.3 Timed Implementation Practice

Test your implementation skills under a time constraint:

Write a basic Counting Sort function in under 10 minutes.
Modify it to be stable while maintaining O(n + k) complexity.
Optimize space usage and implement it in different languages.
Debug an implementation with an unknown bug within 5 minutes.