1. Introduction to Heap Tree
A heap is a specialized tree-based data structure that satisfies the heap property. It is primarily used to implement priority queues and is crucial in algorithms like Heap Sort.
1.1 Properties of Heap Tree
A heap tree has the following properties:
- It is a complete binary tree, meaning all levels, except possibly the last, are completely filled, and all nodes are as left as possible.
- In a max heap, the key of each node is greater than or equal to the keys of its children. In a min heap, it is less than or equal to the keys of its children.
- The height of the heap is $O(\log n)$ where n is the number of elements in the heap.
The heap property ensures that the root of the tree is the largest (max heap) or smallest (min heap) element, and the same property applies recursively to all subtrees.
1.2 Operations on Heap Tree
Common operations performed on a heap include:
- Insertion: Adding a new element to the heap while maintaining the heap property.
- Deletion: Removing the root element (max or min) from the heap and adjusting the tree to maintain the heap property.
- Heapify: A process to rearrange the elements of the heap to maintain the heap property after insertion or deletion.
- Extract Max/Min: Removing and returning the maximum (in max heap) or minimum (in min heap) element of the heap.
These operations are fundamental in manipulating heap data structures and are used in various algorithms including Heap Sort.
1.3 Heap Representation
Heaps can be efficiently represented using an array. For a given element at index i:
- Its left child is located at index $2i + 1$
- Its right child is located at index $2i + 2$
- Its parent is located at index $\lfloor (i-1)/2 \rfloor$
This representation minimizes space requirements and simplifies navigation within the heap.
2. Heap Sort Algorithm
Heap Sort is a comparison-based sorting algorithm that uses a binary heap data structure. It works by building a max heap from the input data, then repeatedly extracting the maximum element from the heap and placing it at the end of the sorted array.
2.1 Algorithm
The Heap Sort algorithm consists of two main phases:
- Build Max Heap: Transform the unsorted input array into a max heap. All elements are rearranged to satisfy the heapproperty.
- Extract Maximum and Heapify: Repeatedly remove the maximum element from the heap and place it at the end of the array. After each extraction, call the heapify function to maintain the max heap property.
Heap Sort is an in-place algorithm, meaning it doesn't require additional storage space. It sorts the elements by manipulating the input array.
2.2 Time Complexity
The time complexity of building a max heap is $O(n)$ and the time complexity of heapifying is $O(\log n)$. Since heapify is called $n$ times, the overall time complexity of Heap Sort is $O(n \log n)$ for the worst, average, and best cases.
Compared to Quick Sort and Merge Sort, Heap Sort has a consistent time complexity of O(n log n) in the best, average, and worst-case scenarios. Unlike Quick Sort, it doesn't suffer from a worst-case time complexity of O(n²). However, Heap Sort lags behind in terms of space efficiency compared to these two as it's not an in-place sort and also doesn't have stable sorting. Furthermore, Quick Sort and Merge Sort have better cache performance and can be parallelized more efficiently, making them more favourable for large datasets.
2.3 Implementation in C++
void heapify(int arr[], int n, int i) {
int largest = i;
int left = 2 * i + 1;
int right = 2 * i + 2;
if (left < n && arr[left] > arr[largest])
largest = left;
if (right < n && arr[right] > arr[largest])
largest = right;
if (largest != i) {
swap(arr[i], arr[largest]);
heapify(arr, n, largest);
}
}
void heapSort(int arr[], int n) {
for (int i = n / 2 - 1; i >= 0; i--)
heapify(arr, n, i);
for (int i = n - 1; i >= 0; i--) {
swap(arr[0], arr[i]);
heapify(arr, i, 0);
}
}
This C++ code snippet demonstrates how to implement the Heap Sort algorithm. The heapify function ensures the heap property is maintained, and the heapSort function uses this to sort an array in-place.
Implementing Heap Sort in C++ involves encapsulating the operations of building a max heap, swapping, and heapifying within a function and calling it recursively. Let's inspect the following implementation:
// C++ program for implementation of Heap Sort
#include <iostream>
using namespace std;
// function to heapify a subtree rooted with node i
void heapify(int arr[], int n, int i) {
int largest = i;
int left = 2*i + 1;
int right = 2*i + 2;
// if left child is larger than root
if (left < n && arr[left] > arr[largest])
largest = left;
// if right child is larger than largest so far
if (right < n && arr[right] > arr[largest])
largest = right;
// if largest is not root
if (largest != i) {
swap(arr[i], arr[largest]);
// Recursively heapify the affected sub-tree
heapify(arr, n, largest);
}
}
// main function to do heap sort
void heapSort(int arr[], int n) {
// Build heap
for (int i = n / 2 - 1; i >= 0; i--)
heapify(arr, n, i);
// One by one extract an element from heap
for (int i=n-1; i>=0; i--) {
// Move current root to end
swap(arr[0], arr[i]);
// call max heapify on the reduced heap
heapify(arr, i, 0);
}
}
// A utility function to print an array of size n
void printArray(int arr[], int n) {
for (int i=0; i < n; ++i)
cout << arr[i] << " ";
cout << "\n";
}
// Driver program
int main() {
int arr[] = {12, 11, 13, 5, 6, 7};
int n = sizeof(arr)/sizeof(arr[0]);
heapSort(arr, n);
cout << "Sorted array is \n";
printArray(arr, n);
}
This code forms a max heap from the array and then swaps the root with the last node, reducing the heap size by 1 each time. This process continues until the heap size becomes 1, resulting in a sorted array. Heapify is a recursive function that ensures the array satisfies the max heap property.
3. Linear-Time Selection Algorithm
As discussed in the Stanford document, the linear-time selection algorithm is an efficient algorithm for solving the selection problem, which is finding the kth smallest number in an array. This algorithm uses a divide-and-conquer approach and is based on the partition method similar to Quick Sort. It has a worst-case time complexity of O(n).
3.1 The Selection Problem and Partitioning
The selection problem is to find the kth smallest element in an unsorted array. The linear-time selection algorithm uses a partitioning approach where it selects a 'pivot' element and partitions the array around the pivot. The elements less than the pivot are moved to its left, and elements greater than the pivot are moved to its right.
3.2 Median-of-Medians Approach
The performance of the linear-time selection algorithm depends on the choice of the pivot. The median-of-medians approach is used to choose a good pivot. This approach divides the input into groups of five, finds the median of each group, and then recursively finds the median of these medians to be used as the pivot. This ensures a better balance in the partition sizes, leading to a linear-time complexity.
3.3 Implementation in C++
void linearTimeSelection(int arr[], int left, int right, int k) {
if (left == right) {
return arr[left];
}
int pivotIndex = medianOfMedians(arr, left, right);
pivotIndex = partition(arr, left, right, pivotIndex);
if (k == pivotIndex) {
return arr[k];
} else if (k < pivotIndex) {
return linearTimeSelection(arr, left, pivotIndex - 1, k);
} else {
return linearTimeSelection(arr, pivotIndex + 1, right, k);
}
}
This C++ function takes an array, its left and right indices, and an integer k, and returns the kth smallest element in the array. It uses the median-of-medians approach to select a pivot and partitions the array around the pivot.
4. Applications and Use Cases
Heap trees and the Heap Sort algorithm have various applications in computer science:
4.1 Priority Queues
Heaps are the underlying data structure for implementing priority queues, which are used in scheduling processes in operating systems, network traffic management, and discrete event simulation.
4.2 Sorting
Heap Sort is used for sorting large datasets. Though its average-case performance is comparable to Quick Sort, it has better worst-case time complexity.
4.3 Selection Algorithms
As seen with the linear-time selection algorithm, heaps can be used to efficiently find the kth smallest or largest element in an array.
4.4 Graph Algorithms
Heaps are used in graph algorithms like Dijkstra's algorithm for finding the shortest path and the Prim's algorithm for finding the minimum spanning tree.
5. Conclusion
Heap Tree is a powerful data structure that is used in various algorithms including Heap Sort. Heap Sort is an efficient comparison-based sorting algorithm with a time complexity of O(n log n). The linear-time selection algorithm is an advanced algorithm that solves the selection problem in linear time using a divide-and-conquer approach. The choice of pivot is crucial for the performance of this algorithm, and the median-of-medians approach is used to find a good pivot. Heaps are versatile and find applications in priority queues, sorting, selection algorithms, and graph algorithms.