Concurrency Control - DMJCCLT - dmj.one

Concurrency Control in Distributed Systems

1. Concurrency Control

What: Concurrency control is a critical mechanism in database management systems that ensures multiple transactions execute reliably and correctly, even when processed concurrently. It prevents data inconsistencies and conflicts.

Why: In multi-user environments, concurrent execution is common to maximize system throughput and minimize latency. Without concurrency control, simultaneous transactions can lead to issues such as data corruption, lost updates, or inconsistent states, violating the ACID properties.

How: Concurrency control achieves its goals by using techniques and protocols designed to maintain the ACID properties:

Techniques such as locking (pessimistic), timestamp ordering, and multi-version concurrency control (optimistic) are commonly used to implement concurrency control while balancing correctness and performance.

1.1 Remote Procedure Calls (RPCs)

What: RPCs enable one process to invoke functions in another process as if it were calling a local function. This abstraction hides the complexity of inter-process communication and makes distributed system design more intuitive.

Why: Distributed systems require interaction between processes running on different nodes. RPCs provide a seamless interface for this, reducing the need for developers to manage low-level communication protocols. It supports modularity and code reuse by allowing processes to share functionality across boundaries.

How: An RPC framework involves the following components:

Key Characteristics:
Challenges:

RPCs face challenges such as ensuring fault tolerance, handling network latencies, and distinguishing between failures (e.g., network drops vs. server crashes). These are mitigated through retransmissions, filtering duplicates, and using idempotent operations wherever possible.

1.2 Transactions

What: A transaction is a sequence of operations performed on a database or system that acts as a single logical unit. It guarantees a consistent and reliable outcome by adhering to the ACID properties.

Why: Transactions are essential in environments with concurrent users or processes, ensuring data integrity and correctness even under failures, interruptions, or high contention. They provide a framework to handle complex operations safely and predictably.

How: Transactions enforce ACID properties:

Transaction Lifecycle:
  1. Begin: The transaction starts, reserving resources as needed.
  2. Execute: Operations are performed, including reads and writes.
  3. Validate: Checks are made to ensure the transaction meets all constraints and does not conflict with others.
  4. Commit: If all conditions are met, changes are made permanent.
  5. Rollback: If any issue arises, all changes are undone, leaving the system unaffected by the transaction.
Practical Example:

// Example: Airline Booking Transaction
int transactionId = openTransaction();
int availability = server.getFlightAvailability("ABC", 123, "2024-12-31");
if (availability > 0) {
    int seat = server.bookTicket("ABC", 123, "2024-12-31");
    server.putSeatPreference(seat, "aisle");
}
closeTransaction(transactionId);

This ensures the ticket booking occurs atomically. If any step fails (e.g., no seats available), the transaction rolls back, leaving the database unaffected.

1.3 Serial Equivalence

What: Serial equivalence ensures that the outcome of executing multiple concurrent transactions is equivalent to executing the same transactions in some sequential (serial) order.

Why: Concurrent execution of transactions is crucial for system performance. However, it introduces the risk of conflicts. Serial equivalence provides a framework to validate that concurrency does not compromise the correctness or consistency of the database.

How: The concept involves checking whether the interleaving of operations (actual execution order) can be transformed into a valid serial order. If such an order exists, the interleaving is deemed serially equivalent.

1.3.1 Conflicting Operations

Conflicts arise when the order of operations affects the outcome. These operations include:

Swapping the order of these operations can lead to different results, hence they are considered conflicting.

Example: Checking for Serial Equivalence

// Example: Serial Equivalence
Transaction T1: write(x)
Transaction T2: read(x)

// Scenario 1 (Order: T1 -> T2):
// T1 writes x, T2 reads the updated value of x.
// Serial equivalence is maintained as (T1, T2).

// Scenario 2 (Order: T2 -> T1):
// T2 reads x, T1 writes to x afterward.
// If the system must behave as though T1 executed first,
// this order violates serial equivalence.
Validation Mechanism:

1.4 Pessimistic Concurrency Control

What: Pessimistic concurrency control assumes conflicts are likely and proactively prevents them by restricting access to data objects using locks. It ensures correctness at the cost of potential delays due to waiting for locks.

Why: In systems where transaction conflicts are frequent (e.g., high contention scenarios), pessimistic approaches provide a reliable way to maintain data integrity by avoiding conflicting operations entirely.

How: Pessimistic control uses locking mechanisms to serialize conflicting operations.

1.4.1 Exclusive Locking

What: Each object has a lock, and only one transaction can hold the lock at a time, ensuring exclusive access.

Why: Prevents all conflicts (read-write, write-write) by allowing only one transaction to operate on the object at a time.

How: Transactions call `lock(O)` to acquire a lock on object `O`. If another transaction holds the lock, the current transaction waits until the lock is released by calling `unlock(O)`. Locks are typically held until the transaction commits or aborts.

1.4.2 Read-Write Locks

What: Implements a two-mode locking mechanism, allowing either shared access (read) or exclusive access (write).

Why: Improves concurrency by allowing multiple transactions to read the same object simultaneously while ensuring exclusivity for writes.

How:

1.4.3 Two-Phase Locking

What: A protocol to enforce serial equivalence by managing the order of lock acquisition and release.

Why: Prevents cyclic dependencies and ensures that no new locks are acquired once a transaction starts releasing locks.

How:

  1. Growing Phase: A transaction acquires all the locks it needs but cannot release any lock.
  2. Shrinking Phase: Once a transaction releases its first lock, it cannot acquire new locks.

Variant: Strict two-phase locking ensures all locks are released only at the commit point, providing stricter isolation.

Challenges: Deadlocks

Deadlocks occur when two or more transactions wait indefinitely for each other to release locks.


// Deadlock Example
Transaction T1: lock(A) -> lock(B)
Transaction T2: lock(B) -> lock(A)
// Both wait for each other indefinitely.

Handling Deadlocks:

1.5 Optimistic Concurrency Control

What: Optimistic concurrency control (OCC) allows transactions to execute without restrictions on access to shared resources. Validation occurs at the commit stage to ensure serial equivalence and conflict resolution.

Why: OCC is ideal in environments with low conflict rates. It maximizes concurrency by reducing the overhead of locks and improves system throughput.

How: OCC operates in three phases:

  1. Read Phase: Transactions read data and perform local computations without locking any resources.
  2. Validation Phase: At commit time, the transaction checks if its operations conflict with others. If conflicts exist, the transaction aborts.
  3. Write Phase: If validation succeeds, changes are written to the database.
1.5.1 Timestamp Ordering

What: Each transaction is assigned a unique timestamp that determines its position in the serialization order.

Why: Timestamp ordering ensures that the sequence of transaction operations adheres to a global order, maintaining consistency.

How:

Transactions violating these conditions are aborted to preserve serial equivalence.

1.5.2 Multi-Version Concurrency Control (MVCC)

What: MVCC maintains multiple versions of each data object, allowing transactions to access a version corresponding to their timestamp.

Why: It eliminates the need for locks, providing higher concurrency and avoiding delays caused by conflicts.

How:

1.5.3 Eventual Consistency

What: A consistency model ensuring that, over time, all copies of data in a distributed system converge to the same value.

Why: Common in distributed systems like Cassandra and DynamoDB, eventual consistency prioritizes availability and scalability over strict consistency.

How: