Data Structures Revision: CSU1051P

Revision

Pre-Introduction and Historical Development

Data Structures and Algorithms are fundamental concepts in computer science that guide us to write efficient and optimized code. These concepts were developed over time, each with their own purposes and improvements over their predecessors. Let's take a look at the historical development and reasons behind the creation of the core data structures and algorithms in our syllabus:

1. Arrays

The concept of Arrays is as old as programming itself. When programming was done using Assembly language in the early days, Arrays were used as a contiguous block of memory used to store multiple values. They are simple and offer direct access to data.

2. Stacks

Stacks were also developed early in the history of computing, being used in the evaluation of arithmetic expressions and syntax parsing. They were used for reversing the order of things and providing simple memory management.

3. Queues

Queues were developed soon after Stacks, providing a mechanism for fairness, ensuring that items that arrived first were processed first (FIFO).

4. Binary Trees

Binary Trees were developed to organize data in a hierarchical and ordered manner. They were used in various applications like expression parsing and Huffman encoding.

5. Binary Search Trees

Binary Search Trees (BSTs) were developed as an improvement over Binary Trees, providing efficient searching, insertion, and deletion. In BSTs, the left child node is always less than the parent node, and the right child node is always greater.

6. AVL Trees

As the need for speed became more crucial, AVL Trees were developed from BSTs to maintain balance, ensuring faster operations. AVL Trees introduced the concept of self-balancing, where the difference of heights of left and right subtrees of any node is not more than one.

7. B-Trees

B-Trees were developed for efficient disk access in databases. Each node in a B-Tree contains multiple keys and can have more than two children, which increases efficiency for large data sets.

8. B+ Trees

B+ Trees were an improvement over B-Trees, primarily used in database systems for efficient disk access. In B+ Trees, all keys are stored in the leaf nodes, which makes range queries faster.

9. Dijkstra's Algorithm

With the increasing complexity and size of data, efficient path-finding algorithms became necessary. Dijkstra's algorithm was one of the first algorithms to find the shortest path in a graph with positive weights.

10. Bellman Ford Algorithm

The Bellman Ford Algorithm was developed as a more versatile but slower alternative to Dijkstra's Algorithm. It can handle graphs with negative edge weights and is used to detect negative cycles in a graph.

1. Introduction to Data Structures and Algorithms

Data Structures and Algorithms (DSA) form the foundational concepts in computer science, which help in organizing and processing data efficiently. Understanding DSA is key to designing efficient software systems or applications. This article provides an overview of several important data structures and algorithms, their advantages, disadvantages, and applications.

2. Array

An Array is a data structure that holds a fixed number of elements of a single type in a contiguous memory location.

2.1 Advantages

Random access is possible due to contiguous memory.
Efficient if the size of the collection is known.

2.2 Disadvantages

Fixed size.
Insertion and deletion operations are expensive.

2.3 Applications

Arrays are used to implement other data structures like heaps, hash tables, etc. They are also used in numeric computations, databases, and more.

2.4 Basic Structure in C++


//Declaration
int arr[10];

// Initialization
int arr[5] = {1, 2, 3, 4, 5};

3. Linked List

A Linked List is a linear data structure where each element is a separate object, known as a node, and each node contains a pointer to the next node in the list.

3.1 Advantages

Dynamic size.
Easy to insert or delete elements.

3.2 Disadvantages

No random access to elements.
Requires extra space for pointers.

3.3 Applications

Linked Lists are used in various applications where we need to frequently add or remove items, such as implementing stacks, queues, and graphs.

3.4 Basic Structure in C++


// A linked list node
class Node {
public:
   int data;
   Node* next;
};

4. Stack

A Stack is a linear data structure which follows a particular order in which the operations are performed. The order is Last In First Out (LIFO).

4.1 Advantages

Follows Last In First Out, which helps in certain use cases like function calling.
Easy to implement.

4.2 Disadvantages

No random access to elements.
The size needs to be defined at the start, which can lead to wastage of space.

4.3 Applications

Stacks are used in programming languages to implement function calling (function stack), used in algorithm problems like parenthesis checking, infix to postfix conversion, etc.

4.4 Basic Structure in C++


#include<stack> 
stack s;

5. Queue

A Queue is a linear structure which follows a particular order in which the operations are performed. The order is First In First Out (FIFO).

5.1 Advantages

Ensures fairness as it follows First In First Out.
Easy to implement.

5.2 Disadvantages

No random access to elements.
The size needs to be defined at the start, which can lead to wastage of space.

5.3 Applications

Queues are used whenever we need to manage objects in order starting with the first one in. Examples include CPU scheduling, Disk Scheduling.

5.4 Basic Structure in C++


#include<queue> 
queue q;

6. Binary Tree

A Binary Tree is a tree data structure in which each node has at most two children, which are referred to as the left child and the right child.

6.1 Advantages

Simpler to implement and understand.
Used in various applications like expression parsing and Huffman encoding.

6.2 Disadvantages

If not balanced, operations can take O(n) time in the worst case.
No direct support for efficient search, insert, and delete operations.

6.3 Applications

Binary Trees are used in many areas including expression parsing, Huffman encoding for data compression, and many more.

6.4 Basic Structure in C++


class Node {
public:
   int data;
   Node* left;
   Node* right;
};

7. Binary Search Tree

A Binary Search Tree is a node-based binary tree data structure which has the following properties: The left subtree of a node contains only nodes with keys lesser than the node’s key. The right subtree of a node contains only nodes with keys greater than the node’s key. The left and right subtree each must also be a binary search tree.

7.1 Advantages

Search, insert, and delete operations take O(log n) in an average case scenario.
Easy to implement.

7.2 Disadvantages

If not balanced, operations can take O(n) time in the worst case.
Requires additional memory for pointers.

7.3 Applications

Binary Search Trees are used in search applications where data is constantly entering/leaving, such as the map and set objects in many languages' standard libraries.

7.4 Basic Structure in C++


class Node {
public:
   int data;
   Node* left;
   Node* right;
};

8. AVL Tree

AVL Tree or Height-Balanced Binary Search Tree is a self-balancing binary search tree, and it was the first such data structure to be invented. In an AVL tree, the heights of the two child subtrees of any node differ by at most one.

8.1 Advantages

Maintains the tree balanced, ensuring O(log n) time complexity for search, insert, and delete operations.
Can be used in situations where frequent data insertions and deletions are involved.

8.2 Disadvantages

More complex to implement as compared to BST.
Overhead of balancing the trees.

8.3 Applications

AVL Trees are used in systems where we can't afford any operation to be too slow, such as databases and filesystems.

8.4 Basic Structure in C++


class Node {
public:
   int key;
   Node *left;
   Node *right;
   int height;
};

9. B-Tree

A B-Tree is a self-balancing search tree, in which every node contains multiple keys and has more than two children. Each internal node's keys act as separator values which divide its subtrees.

9.1 Advantages

Keeps keys sorted and allows for efficient insertion, deletion, and search operations.
Designed for systems with large amounts of data and where read and write operations are costly (like disk or database operations).

9.2 Disadvantages

Complex to implement.
Overhead of maintaining balance and multiple keys.

9.3 Applications

B-Trees are widely used in filesystems and databases to allow for efficient access to large amounts of data.

10. B+Tree

A B+ tree is an n-ary tree with a variable but often large number of children per node. It is a type of tree which represents sorted data in a way that allows for efficient insertion, retrieval and removal of records, each of which is identified by a key. It is a dynamic, multilevel index, with maximum and minimum bounds on the number of keys in the nodes.

10.1 Advantages

More efficient than B-Trees for disk access in databases.
Great for range queries since all keys are stored in the leaves.

10.2 Disadvantages

Complex to implement.
Additional overhead of maintaining linked list at the leaf level.

10.3 Applications

B+Trees are often used in databases and file systems for efficient data access and range queries.

11. Dijkstra's Algorithm

Dijkstra's Algorithm is a popular graph algorithm that finds the shortest path from a starting node to all nodes in a weighted graph (the weights must be positive).

11.1 Advantages

It provides the shortest path from one particular source node to all other nodes in the graph.
It is widely used in network routing protocols.

11.2 Disadvantages

Does not work with graphs with negative edge weights.
May not work correctly with dynamic graphs where edge weights change over time.

11.3 Applications

Used in network routing protocols, telephone network, traffic engineering, etc.

11.4 Basic Structure in C++


// Dijkstra's algorithm is complex to write. It requires the use of
// a priority queue and additional classes for the graph and edges.

12. Bellman Ford Algorithm

The Bellman–Ford algorithm is an algorithm that computes shortest paths from a single source vertex to all of the other vertices in a weighted digraph. It is slower than Dijkstra's algorithm for the same problem, but more versatile, as it is capable of handling graphs in which some of the edge weights are negative numbers.

12.1 Advantages

Can handle negative edge weights.
Can be used to detect negative cycles in a graph.

12.2 Disadvantages

Slower than Dijkstra's Algorithm.
Does not work with graphs where negative cycles are reachable from the source vertex.

12.3 Applications

Used in routing protocols in networks, arbitrage opportunities in currency exchange rates, etc.

12.4 Basic Structure in C++


// Bellman Ford's algorithm is complex to write. It requires 
// additional classes for the graph and edges.

Differences

1. Difference Between Array, Stack, and Queue

An array is a basic data structure that stores elements of the same type, while a stack and a queue are abstract data types that can be implemented using arrays or linked lists. A stack follows the LIFO (Last In First Out) principle, meaning that the last element inserted is the first one to get out. On the other hand, a queue follows the FIFO (First In First Out) principle, meaning the first element inserted is the first one to get out.

2. Difference Between Binary Tree and Binary Search Tree

A Binary Tree is a tree data structure where each node has at most two children, referred to as the left child and the right child. A Binary Search Tree (BST) is a special case of a Binary Tree where the data elements of each node are in order, so that for each node, all elements in the left subtree are less to the node and all elements in the right subtree are greater than the node.

3. Difference Between BST and AVL Tree

A Binary Search Tree (BST) is a tree in which all the nodes follow the property that they are larger than all the nodes in their left sub-tree and smaller than all the nodes in their right sub-tree. However, BSTs can become unbalanced with multiple insertions and deletions, which can make search operations slower. An AVL Tree, named after its inventors Adelson-Velsky and Landis, is a self-balancing binary search tree. In an AVL tree, the difference of heights of left and right subtrees for any node is less than or equal to one, ensuring it remains balanced and search operations remain faster.

4. Difference Between B-Tree and B+Tree

B-Tree and B+Tree are both self-balancing tree data structures that maintain sorted data and are commonly used for database indexes and filesystems. B-Tree store data at both leaf and internal nodes, whereas B+Tree stores data only at leaf nodes, with internal nodes only used for distribution purposes. This makes B+Tree more efficient for large data and range queries.

5. Difference Between Dijkstra's Algorithm and Bellman Ford Algorithm

Both Dijkstra's Algorithm and the Bellman Ford Algorithm are used to find the shortest path in a graph, but they differ in how they handle edge weights. Dijkstra's Algorithm performs better with positive edge weights but fails in the case of negative edge weights. On the other hand, Bellman Ford's Algorithm can handle negative edge weights and is capable of detecting negative cycles in the graph.