Tries in Data Structures - CSU083 | Shoolini University

Heaps in Data Structures

1. Prerequisites

Before learning Heaps, one should be familiar with:

2. What Is a Heap?

A Heap is a specialized tree-based data structure that satisfies the heap property:

Heaps are always kept complete: all levels are fully filled except possibly the last level, which is filled from left to right.

2.1 Heap Representation Using Arrays

We often store a heap in a single array. For a node at index i:

  • Left child: index 2i + 1
  • Right child: index 2i + 2
  • Parent: index ⌊(i - 1)/2⌋

2.2 Core Operations

  • Insert: Place the new element at the end, then “bubble up” until the heap property is restored.
  • Extract (Max or Min): Remove the root, replace it with the last element, then “sift down” to restore the property.
// Example: Insert into a Max-Heap (array-based)
void heapInsert(vector<int>& heap, int value) {
    heap.push_back(value);
    int i = heap.size() - 1;
    // Bubble up
    while (i > 0 && heap[(i - 1) / 2] < heap[i]) {
        swap(heap[i], heap[(i - 1) / 2]);
        i = (i - 1) / 2;
    }
}

3. Why Does This Data Structure Exist?

A Heap excels at maintaining a priority element accessible in O(1) time, with insertion and removal in O(log n). This provides:

4. How Does It Compare to Alternatives? (Strengths & Weaknesses)

4.1 Strengths

  • Guaranteed Logarithmic Insertion/Extraction: Predictable performance.
  • Easy to Implement: A simple array can represent the structure.

4.2 Weaknesses

  • Limited Search Efficiency: Finding an arbitrary value is O(n) because the heap only guarantees the min/max at the root.
  • Strict Partial Ordering: Only the root is guaranteed to be the extreme; siblings are not in sorted order.

5. When Should You Use It?

6. Basic Max-Heap Implementation

Below is a simple C++ example. We'll maintain a vector to store heap elements. A heapifyUp function ensures the max-heap property after insertion, and heapifyDown restores order after removing the root.

6.1 Core Functions


#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;

class MaxHeap {
private:
    vector<int> heap;

    void heapifyUp(int index) {
        // Move the element at 'index' up the tree as needed
        while (index > 0) {
            int parent = (index - 1) / 2;
            if (heap[index] > heap[parent]) {
                swap(heap[index], heap[parent]);
                index = parent;
            } else {
                break;
            }
        }
    }

    void heapifyDown(int index) {
        // Move the element at 'index' down the tree as needed
        int size = heap.size();
        while (true) {
            int left = 2 * index + 1;
            int right = 2 * index + 2;
            int largest = index;

            // Check left child
            if (left < size && heap[left] > heap[largest]) {
                largest = left;
            }
            // Check right child
            if (right < size && heap[right] > heap[largest]) {
                largest = right;
            }
            // If no swap needed, heap property is restored
            if (largest == index) break;
            // Swap and continue
            swap(heap[index], heap[largest]);
            index = largest;
        }
    }

public:
    void insert(int value) {
        heap.push_back(value);
        heapifyUp(heap.size() - 1);
    }

    int extractMax() {
        if (heap.empty()) {
            throw runtime_error("Heap is empty!");
        }
        int maxValue = heap[0];
        heap[0] = heap.back();
        heap.pop_back();
        if (!heap.empty()) {
            heapifyDown(0);
        }
        return maxValue;
    }

    void printHeap() {
        for (int val : heap) {
            cout << val << " ";
        }
        cout << endl;
    }
};

int main() {
    MaxHeap mh;
    mh.insert(4);
    mh.insert(7);
    mh.insert(1);
    mh.insert(3);

    cout << "Heap after insertions: ";
    mh.printHeap();

    int maxElement = mh.extractMax();
    cout << "Extracted max: " << maxElement << endl;
    cout << "Heap after extracting max: ";
    mh.printHeap();

    return 0;
}

7. Dry Run (Step-by-Step)

Consider inserting the elements 4, 7, 1, 3 in that order.

7.1 Initial State

heap = [] (empty)

7.2 Insert 4

  • Push 4 into heap: heap = [4]
  • heapifyUp:
    • Index = 0 → Parent index = (0 - 1) / 2 = -1 (invalid), stop.

heap = [4]

7.3 Insert 7

  • Push 7: heap = [4, 7]
  • heapifyUp(index = 1):
    • Parent of 1 = 0
    • heap[1] = 7 > heap[0] = 4, swap ⇒ [7, 4]
    • Now index = 0, no parent to check, stop.

heap = [7, 4]

7.4 Insert 1

  • Push 1: heap = [7, 4, 1]
  • heapifyUp(index = 2):
    • Parent of 2 = 0
    • heap[2] = 1 < heap[0] = 7, no swap, stop.

heap = [7, 4, 1]

7.5 Insert 3

  • Push 3: heap = [7, 4, 1, 3]
  • heapifyUp(index = 3):
    • Parent of 3 = 1 (integer division)
    • heap[3] = 3 < heap[1] = 4, no swap needed, stop.

heap = [7, 4, 1, 3]

7.6 Extract Max

  • Max = heap[0] = 7
  • Move last element heap[3] = 3 to root ⇒ heap = [3, 4, 1]
  • heapifyDown(index = 0):
    • left = 1 → heap[1] = 4, right = 2 → heap[2] = 1
    • largest = 1 since 4 > 3
    • Swap heap[0] and heap[1] ⇒ [4, 3, 1]
    • Now index = 1, new children: left = 3, right = 4 (both invalid indices), stop.

Extracted 7. Now heap = [4, 3, 1].

8. Time Complexity Analysis (Big-O Mastery)

Let's analyze the time complexity for various heap operations.

8.1 Insertion (heapify-up)

  • Inserting a new element at the end takes O(1).
  • It may need to move up the tree (heapify-up).
  • Since a heap is a complete binary tree, its height is O(log n).
  • Worst-case: The new element bubbles up to the root, taking O(log n).
  • Best-case: No swaps needed, takes O(1).
  • Average-case: Element moves up halfway on average → O(log n).

8.2 Extract Max/Min (heapify-down)

  • Removing the root and replacing it with the last element takes O(1).
  • Heapify-down process moves the root to its correct position.
  • Worst-case: Swaps down to the last level → O(log n).
  • Best-case: No swaps needed, already in place → O(1).
  • Average-case: Moves halfway down on average → O(log n).

8.3 Heap Construction (Building a Heap)

  • Naïve approach: Insert one-by-one → O(n log n).
  • Optimized approach: Heapify from bottom-up → O(n).

The proof for O(n) comes from the summation:

$$ \sum_{i=0}^{\log n} \frac{n}{2^i} O(i) = O(n) $$

8.4 Heap Sort

  • Building the heap: O(n) (optimized heapify).
  • Extracting elements one-by-one: O(n log n).
  • Overall complexity: O(n log n).

8.5 Searching for an Element

  • Since the heap is not sorted, searching requires scanning all elements.
  • Worst-case: O(n).
  • Best-case: O(1) if found at the root.
Operation Best Case Average Case Worst Case
Insert O(1) O(log n) O(log n)
Extract Max/Min O(1) O(log n) O(log n)
Build Heap O(n) O(n) O(n)
Heap Sort O(n log n) O(n log n) O(n log n)
Search O(1) O(n) O(n)

9. Space Complexity Analysis

10. Trade-offs & Efficiency

11. Common Optimizations for Heaps

Optimizing heaps can significantly enhance performance, particularly in large-scale applications.

11.1 Optimized Heap Construction (Bottom-Up Heapify)

The naive method inserts elements one by one, taking O(n log n). The optimized method, known as Floyd’s heap construction, builds the heap in O(n) using bottom-up heapification.


void buildHeap(vector<int>& heap) {
    int n = heap.size();
    for (int i = (n / 2) - 1; i >= 0; i--) {
        heapifyDown(heap, i, n);
    }
}

Why O(n) instead of O(n log n)? Instead of inserting each element individually, we heapify non-leaf nodes from bottom-up, reducing redundant swaps.

11.2 Lazy Deletion for Faster Remove Operations

Instead of physically removing an element and restructuring the heap, we mark elements as deleted and rebuild only when necessary, reducing unnecessary swaps.

11.3 Using Binary Heap Variants for Faster Operations

Several optimized heap structures exist, each tailored for specific operations.

  • d-ary Heap: Generalizes binary heaps with d children per node, reducing depth and improving performance in priority queues.
  • Fibonacci Heap: Provides amortized O(1) insertion and O(log n) extract-min, improving Dijkstra’s algorithm.
  • Pairing Heap: A self-adjusting heap with amortized O(1) insert and O(log n) extract-min, often outperforming Fibonacci heaps in practice.

12. Heap Variants (Different Versions of the Algorithm)

Different types of heaps exist to optimize specific use cases:

12.1 Binomial Heap

A collection of binomial trees that supports fast union operations in O(log n), useful in dynamic priority queues.

12.2 Fibonacci Heap

Uses a more relaxed structure, allowing for O(1) insertion and amortized O(log n) extract-min. It’s widely used in Dijkstra’s algorithm.

12.3 Min-Max Heap

A double-ended heap that allows both minimum and maximum extraction in O(log n), useful for range-based priority queues.

12.4 Soft Heap

A heap with constant-time insertion but allows some elements to be corrupted, making it useful in approximation algorithms.

13. Iterative vs. Recursive Implementations (Efficiency Comparison)

13.1 Heapify-Up (Insertion) - Iterative vs. Recursive

13.1.1 Iterative Version

void heapifyUp(vector<int>& heap, int index) {
    while (index > 0) {
        int parent = (index - 1) / 2;
        if (heap[parent] < heap[index]) {
            swap(heap[parent], heap[index]);
            index = parent;
        } else break;
    }
}

Advantages: ✅ No recursive overhead (stack usage). ✅ Better suited for large heaps (avoids recursion depth limits).

13.1.2 Recursive Version

void heapifyUp(vector<int>& heap, int index) {
    if (index == 0) return;
    int parent = (index - 1) / 2;
    if (heap[parent] < heap[index]) {
        swap(heap[parent], heap[index]);
        heapifyUp(heap, parent);
    }
}

Advantages: ✅ Code is more readable. 🚫 Recursion may cause stack overflow in deep heaps.

13.2 Heapify-Down (Extract Max) - Iterative vs. Recursive

13.2.1 Iterative Version

void heapifyDown(vector<int>& heap, int index, int size) {
    while (true) {
        int left = 2 * index + 1;
        int right = 2 * index + 2;
        int largest = index;

        if (left < size && heap[left] > heap[largest]) largest = left;
        if (right < size && heap[right] > heap[largest]) largest = right;

        if (largest != index) {
            swap(heap[index], heap[largest]);
            index = largest;
        } else break;
    }
}
13.2.2 Recursive Version

void heapifyDown(vector<int>& heap, int index, int size) {
    int left = 2 * index + 1;
    int right = 2 * index + 2;
    int largest = index;

    if (left < size && heap[left] > heap[largest]) largest = left;
    if (right < size && heap[right] > heap[largest]) largest = right;

    if (largest != index) {
        swap(heap[index], heap[largest]);
        heapifyDown(heap, largest, size);
    }
}

Comparison:

  • Iterative: More efficient, avoids recursion overhead.
  • Recursive: More readable but limited by recursion depth.

14. Summary of Optimizations

15. Common Pitfalls & Edge Cases

Understanding and handling edge cases ensures a robust heap implementation. Below are the most common pitfalls:

15.1 Handling an Empty Heap

  • Edge Case: Extracting from an empty heap.
  • Issue: Accessing an empty array causes runtime errors.
  • Solution: Before extracting, check if the heap is empty.

int extractMax(vector<int>& heap) {
    if (heap.empty()) {
        throw runtime_error("Heap is empty!");
    }
    int maxValue = heap[0];
    heap[0] = heap.back();
    heap.pop_back();
    heapifyDown(heap, 0, heap.size());
    return maxValue;
}

15.2 Handling Single Element Heap

  • Edge Case: Inserting and then extracting when only one element exists.
  • Issue: The heapify-down operation should not run unnecessarily.
  • Solution: If size becomes 0 after extraction, return early.

15.3 Duplicates in the Heap

  • Edge Case: Handling duplicate values in insertion and extraction.
  • Issue: While inserting, duplicates should not break the heap property.
  • Solution: Heap operations should work fine as long as comparisons are consistent.

15.4 Large Inputs (Stress Testing)

  • Edge Case: Inserting and extracting millions of elements.
  • Issue: Inefficient heapify implementations might cause timeouts.
  • Solution: Use bottom-up heapify for construction in O(n).

15.5 Handling Negative Values

  • Edge Case: Working with negative numbers.
  • Issue: Comparison operations must handle negative values correctly.
  • Solution: The heap property is maintained as long as comparison logic is intact.

15.6 Floating Point Values

  • Edge Case: Using heaps with floating-point values.
  • Issue: Precision errors may cause incorrect ordering.
  • Solution: Consider rounding or using integer scaling for accuracy.

16. Test Cases to Verify Correctness

Below are various test cases to ensure correctness and robustness.

16.1 Insertion and Extraction


void testInsertionAndExtraction() {
    MaxHeap heap;
    heap.insert(10);
    heap.insert(5);
    heap.insert(15);
    heap.insert(20);

    assert(heap.extractMax() == 20);
    assert(heap.extractMax() == 15);
    assert(heap.extractMax() == 10);
    assert(heap.extractMax() == 5);
}

16.2 Extracting from an Empty Heap


void testEmptyHeapExtraction() {
    MaxHeap heap;
    try {
        heap.extractMax();
        assert(false); // Should not reach this
    } catch (runtime_error& e) {
        assert(true); // Expected exception
    }
}

16.3 Handling Duplicates


void testDuplicateValues() {
    MaxHeap heap;
    heap.insert(10);
    heap.insert(10);
    heap.insert(10);

    assert(heap.extractMax() == 10);
    assert(heap.extractMax() == 10);
    assert(heap.extractMax() == 10);
}

16.4 Large Input Performance


void testLargeHeap() {
    MaxHeap heap;
    for (int i = 1; i <= 1000000; i++) {
        heap.insert(i);
    }

    for (int i = 1000000; i >= 1; i--) {
        assert(heap.extractMax() == i);
    }
}

16.5 Handling Negative Numbers


void testNegativeValues() {
    MaxHeap heap;
    heap.insert(-10);
    heap.insert(-5);
    heap.insert(-20);
    heap.insert(-1);

    assert(heap.extractMax() == -1);
    assert(heap.extractMax() == -5);
    assert(heap.extractMax() == -10);
    assert(heap.extractMax() == -20);
}

17. Real-World Failure Scenarios

17.1 Memory Overflows

  • Inserting too many elements in a system with limited memory.
  • Heap growth must be handled with memory-efficient implementations.

17.2 Corrupted Heap Order

  • Manual modifications of the heap array (e.g., unintended swaps).
  • Debugging requires reapplying heapify to restore order.

17.3 Thread Safety Issues

  • Race conditions in concurrent priority queue implementations.
  • Use locks or thread-safe priority queue implementations.

18. Quick Recap

19. Real-World Applications of Heaps

Heaps play a crucial role in various domains due to their efficient priority retrieval capabilities. Below are key industry applications:

19.1 Priority Queues (Operating Systems & Task Scheduling)

  • OS Process Scheduling: The CPU scheduler maintains a queue of tasks where higher-priority processes execute first.
  • Thread Scheduling: Multi-threaded applications prioritize execution using heaps.

19.2 Graph Algorithms (Dijkstra's & Prim's Algorithm)

  • Shortest Path (Dijkstra’s Algorithm): Uses a min-heap to efficiently retrieve the node with the smallest distance.
  • Minimum Spanning Tree (Prim’s Algorithm): Uses heaps to find the minimum edge efficiently.

19.3 Data Compression (Huffman Coding)

  • Huffman coding, used in file compression formats like ZIP, PNG, and MP3, builds an optimal prefix tree using a min-heap.

19.4 Search Engines (Google & Bing Ranking Systems)

  • Search engines store the top-k search results using a max-heap, ensuring the most relevant results appear first.

19.5 Load Balancing in Web Servers

  • Many web services (e.g., AWS Load Balancer) use heaps to efficiently assign tasks to the least-loaded server.

19.6 Real-Time Median Finding (Finance & Stock Markets)

  • Maintaining a real-time median for stock price fluctuations requires two heaps (min-heap and max-heap).

19.7 AI & Machine Learning (A* Algorithm for Pathfinding)

  • Used in AI and robotics for shortest pathfinding in games, autonomous cars, and GPS navigation systems.

19.8 Social Media Feeds (Top-K Trending Topics)

  • Social networks like Twitter, Facebook, and Instagram use heaps to maintain a list of trending topics.

20. Open-Source Implementations of Heaps

Several open-source libraries provide efficient heap implementations:

21. Practical Heap Implementation (Top-K Trending Words in a File)

We’ll use a min-heap to efficiently find the top K most frequently occurring words in a file.

21.1 Python Implementation


import heapq
from collections import Counter

def top_k_words(filename, k):
    # Read file and count word occurrences
    with open(filename, 'r') as file:
        words = file.read().split()
    
    word_count = Counter(words)  # Count word frequencies
    
    # Use a min-heap to store the top K words
    min_heap = []
    
    for word, freq in word_count.items():
        heapq.heappush(min_heap, (freq, word))
        if len(min_heap) > k:
            heapq.heappop(min_heap)  # Remove the least frequent word
    
    # Extract results from heap
    top_words = sorted(min_heap, reverse=True)  # Sort in descending order
    return top_words

# Example usage
filename = "sample.txt"
k = 5
print(top_k_words(filename, k))

21.2 Sample Dry Run

Given an input file sample.txt containing:


apple banana apple orange banana banana apple grape grape

The output for top_k_words("sample.txt", 3) would be:


[(3, 'apple'), (3, 'banana'), (2, 'grape')]

21.3 Explanation

  • Reads the file and counts occurrences of each word.
  • Uses a min-heap to track the top K words.
  • If heap size exceeds K, the least frequent word is removed.
  • Final heap contains the most frequent K words, sorted in descending order.

22. Quick Recap

23. Competitive Programming: Heap-Based Problems

To master heaps in competitive programming, solve the following 10 problems. These problems are categorized by difficulty.

23.1 Beginner (Basic Heap Operations)

  • 🟢 Find K largest elements *Given an array of size N, find the K largest elements using a heap.* (Hint: Use a min-heap of size K.)
  • 🟢 Kth smallest/largest element *Find the Kth smallest or largest element in an unsorted array.* (Hint: Use a heap with O(log K) insertion.)
  • 🟢 Merge K sorted lists *You are given K sorted linked lists. Merge them into a single sorted list.* (Hint: Use a min-heap to efficiently extract the smallest element from each list.)

23.2 Intermediate (Priority Queues & Optimization)

  • 🟠 Find Median in a Running Stream *Given a stream of numbers, find the median after each insertion.* (Hint: Maintain two heaps: a max-heap for the left half and a min-heap for the right half.)
  • 🟠 Sort Nearly Sorted Array (K-Sorted Array) *Given an array where every element is at most K positions away from its sorted order, sort it efficiently.* (Hint: Use a min-heap of size K.)
  • 🟠 Task Scheduler (Leetcode 621) *Given a list of tasks and a cooldown period, schedule tasks with minimum idle time.* (Hint: Use a max-heap to always schedule the most frequent task first.)

23.3 Advanced (Graph Algorithms & Complex Heaps)

  • 🔴 Dijkstra’s Algorithm *Find the shortest path from a source node to all other nodes in a weighted graph.* (Hint: Use a min-heap to extract the node with the smallest distance.)
  • 🔴 Prim’s Minimum Spanning Tree *Given a weighted graph, find the Minimum Spanning Tree using Prim’s algorithm.* (Hint: Use a priority queue (min-heap) to always select the minimum edge.)
  • 🔴 Top K Frequent Elements (Leetcode 347) *Find the K most frequent elements in an array.* (Hint: Use a min-heap to store frequency counts.)
  • 🔴 Maximum Capital (Investment Problem) *Given capital C and a set of projects (each with a profit & capital requirement), maximize profit by selecting up to K projects.* (Hint: Use two heaps: one to track available projects and another for profit maximization.)

24. System Design Integration: Using Heaps in Real-World Architectures

24.1 System Design Problem: Real-Time Leaderboard

Design a real-time leaderboard system where users' scores update frequently, and you need to retrieve the top K players efficiently.

Requirements:

  • Supports millions of players.
  • Each player’s score updates frequently.
  • Retrieving the top K players should be fast.

Heap-Based Solution:

  • Use a max-heap to maintain the top K scores.
  • For each update, insert the new score into the heap.
  • When the heap size exceeds K, remove the smallest score.

import heapq

class Leaderboard:
    def __init__(self, k):
        self.k = k
        self.heap = []  # Min-heap to track top K scores

    def add_score(self, player, score):
        heapq.heappush(self.heap, (score, player))
        if len(self.heap) > self.k:
            heapq.heappop(self.heap)  # Remove lowest score

    def get_top_players(self):
        return sorted(self.heap, reverse=True)

# Example Usage:
lb = Leaderboard(3)
lb.add_score("Alice", 50)
lb.add_score("Bob", 70)
lb.add_score("Charlie", 65)
lb.add_score("David", 80)

print(lb.get_top_players())  # Top 3 players

Scalability Considerations:

  • For large-scale leaderboards, store scores in Redis Sorted Sets (`ZADD` & `ZRANGE`).
  • For frequent updates, maintain a lazy deletion heap to reduce expensive operations.

24. Practicing Heaps Under Time Constraints

Competitive programmers must implement heaps quickly & efficiently during contests.

24.1 Time-Based Coding Challenges

  • 🕒 Implement a Max-Heap in 10 minutes
  • 🕒 Solve Kth Largest Element in 5 minutes
  • 🕒 Write Heap Sort in 15 minutes

24.2 Speed Practice Techniques

  • Master STL (C++), heapq (Python), PriorityQueue (Java).
  • Use template functions for quick heap initialization.
  • Understand heap optimizations to reduce unnecessary swaps.

25. Quick Recap