Transactions - CSU357 - Shoolini University

Transactions in DBMS

1. Transaction in DBMS

A transaction in a Database Management System (DBMS) is a sequence of operations performed as a single logical unit of work. Each transaction must adhere to the ACID properties to ensure data consistency and integrity.

1.1. Transaction Concepts

A transaction is defined as a group of tasks that are treated as a single unit. Each task is an indivisible unit of work, often referred to as an atomic action.

Example: Calculating the total marks of a student involves three steps:

Retrieve marks for each subject (3 subjects).
Calculate the total marks.
Display the total marks.

These steps together form a transaction and cannot be divided further.

1.2. ACID Properties of Transactions

Each transaction must adhere to the following ACID properties:

Atomicity: Ensures that all operations within the work unit are completed successfully; otherwise, the transaction is aborted at the point of failure, and all previous operations are undone. In the example, either all three steps (retrieving, calculating, displaying) are completed, or none are.
Consistency: Ensures that a transaction brings the database from one valid state to another valid state. The total marks calculated must accurately reflect the sum of the marks retrieved.
Isolation: Ensures that the operations within a transaction are isolated from other transactions. Changes made during the transaction are not visible to other transactions until the transaction is committed. While calculating the total marks, other transactions should not interfere.
Durability: Ensures that once a transaction has been committed, it remains so, even in the event of a system failure. Once the total marks are displayed, they should be permanently recorded in the database.

Lets read about them in detail.

1.2.1. Atomicity

Atomicity ensures that either the entire transaction takes place at once or doesn’t happen at all. There is no midway, meaning transactions do not occur partially. Each transaction is considered as one unit and either runs to completion or is not executed at all. It involves two operations:

Abort: If a transaction aborts, changes made to the database are not visible.
Commit: If a transaction commits, changes made are visible.

Atomicity is also known as the ‘All or nothing rule’.

Example: Consider a transaction T consisting of T1 and T2: Transfer of 100 from account X to account Y. If the transaction fails after the completion of T1 but before the completion of T2 (say, after write(X) but before write(Y)), then the amount has been deducted from X but not added to Y. This results in an inconsistent database state. Therefore, the transaction must be executed in its entirety to ensure the correctness of the database state.

1.2.2. Consistency

Consistency ensures that integrity constraints must be maintained so that the database is consistent before and after the transaction. It refers to the correctness of the database. Referring to the example above, the total amount before and after the transaction must be maintained.

Total before T occurs = 500 + 200 = 700.

Total after T occurs = 400 + 300 = 700.

Therefore, the database is consistent. Inconsistency occurs in case T1 completes but T2 fails. As a result, T is incomplete.

1.2.3. Isolation

Isolation ensures that multiple transactions can occur concurrently without leading to an inconsistent database state. Transactions occur independently without interference. Changes occurring in a particular transaction will not be visible to any other transaction until that particular change in that transaction is written to memory or has been committed. This property ensures that the execution of transactions concurrently will result in a state that is equivalent to a state achieved if these were executed serially in some order.

Example: Let X= 500, Y = 500. Consider two transactions T and T’. Suppose T has been executed till Read (Y) and then T’ starts. As a result, interleaving of operations takes place due to which T’ reads the correct value of X but an incorrect value of Y and the sum computed by T’ (X+Y = 50000+500=50500) is thus not consistent with the sum at the end of the transaction: T (X+Y = 50000 + 450 = 50450). This results in database inconsistency, due to a loss of 50 units. Hence, transactions must take place in isolation and changes should be visible only after they have been made to the main memory.

1.2.4. Durability

Durability ensures that once the transaction has completed execution, the updates and modifications to the database are stored and written to disk, and they persist even if a system failure occurs. These updates now become permanent and are stored in a non-volatile memory. The effects of the transaction, thus, are never lost.

The ACID properties, in totality, provide a mechanism to ensure the correctness and consistency of a database in a way such that each transaction is a group of operations that acts as a single unit, produces consistent results, acts in isolation from other operations, and updates that it makes are durably stored.

1.3. States of a Transaction

Transactions go through several states during their lifecycle:

Active: The initial state when the transaction is being executed.
Partially Committed: When the final operation is executed but before the transaction is fully committed.
Committed: When all operations are successfully executed, and the transaction's effects are permanently recorded in the database.
Failed: If any operation within the transaction fails, the transaction moves to the failed state.
Aborted: If the transaction fails, it is rolled back, and the database is restored to its previous state.
Terminated: The final state after the transaction has been committed or rolled back.

1.4. SQL Implementation of a Transaction

The following SQL code demonstrates the steps involved in a transaction to calculate and display the total marks of a student:


-- Begin the transaction
START TRANSACTION;

-- Step 1: Retrieve marks for each subject
SELECT marks INTO @subject1 FROM marks_table WHERE student_id = 1 AND subject = 'subject1';
SELECT marks INTO @subject2 FROM marks_table WHERE student_id = 1 AND subject = 'subject2';
SELECT marks INTO @subject3 FROM marks_table WHERE student_id = 1 AND subject = 'subject3';

-- Step 2: Calculate the total marks
SET @total_marks = @subject1 + @subject2 + @subject3;

-- Step 3: Display the total marks
SELECT @total_marks AS total_marks;

-- Commit the transaction
COMMIT;

This example ensures that the retrieval, calculation, and display of marks are treated as a single, atomic transaction, maintaining the ACID properties throughout.

1.5. Detailed Steps in a Transaction

Let's delve deeper into each step of the transaction process:

1.5.1. Step 1: Retrieve Marks

The first step involves retrieving the marks for each subject. This is done using SELECT statements that fetch the marks for a specific student and subject from the database.


-- Retrieve marks for subject1
SELECT marks INTO @subject1 FROM marks_table WHERE student_id = 1 AND subject = 'subject1';

-- Retrieve marks for subject2
SELECT marks INTO @subject2 FROM marks_table WHERE student_id = 1 AND subject = 'subject2';

-- Retrieve marks for subject3
SELECT marks INTO @subject3 FROM marks_table WHERE student_id = 1 AND subject = 'subject3';

Each SELECT statement fetches the marks for one subject and stores it in a variable.

1.5.2. Step 2: Calculate Total Marks

The next step involves calculating the total marks by summing the marks retrieved for each subject.


-- Calculate the total marks
SET @total_marks = @subject1 + @subject2 + @subject3;

This step ensures that the total marks reflect the sum of the individual subject marks.

1.5.3. Step 3: Display Total Marks

The final step is to display the total marks. This is done using a SELECT statement that outputs the calculated total marks.


-- Display the total marks
SELECT @total_marks AS total_marks;

This step outputs the total marks to the user or the application.

1.6. Rollback in Transactions

In case of any error or failure during the transaction, a rollback can be performed to undo all the operations performed within the transaction, ensuring that the database remains consistent.


-- Begin the transaction
START TRANSACTION;

-- Perform operations
-- ... (steps to retrieve, calculate, and display marks)

-- If an error occurs
ROLLBACK;

-- If no errors, commit the transaction
COMMIT;

The ROLLBACK statement undoes all operations performed since the BEGIN TRANSACTION statement, reverting the database to its previous consistent state.

1.7. Concurrency Control in Transactions

Concurrency control ensures that transactions are executed in a safe manner when multiple transactions are executed concurrently. This is achieved through locking mechanisms:

Locks: Locks prevent other transactions from accessing data that is being used by the current transaction. Types of locks include shared locks (allowing read-only access) and exclusive locks (preventing any other access).
Timestamp Ordering: This mechanism ensures that transactions are executed in the order of their timestamps, maintaining consistency.
Multiversion Concurrency Control (MVCC): MVCC allows multiple versions of data to exist, ensuring that read operations do not block write operations and vice versa.

1.8. Commit Protocols

Commit protocols ensure that transactions are committed safely and consistently across multiple databases or nodes. Examples include:

Two-Phase Commit (2PC): Involves a prepare phase where all participants agree to commit, followed by a commit phase where the transaction is committed.
Three-Phase Commit (3PC): Adds an additional phase to the 2PC to reduce the risk of blocking in case of a failure.

1.9. Schedules in DBMS

In a Database Management System (DBMS), a schedule refers to the sequence of operations from various transactions. It determines the order in which operations like read and write are executed.

1.10. Types of Schedules

Schedules can be categorized into different types based on their properties:

Serial Schedule: A schedule in which transactions are executed one after another without overlapping.
Non-Serial Schedule: A schedule where transactions are interleaved, allowing for concurrent execution.

1.11. Serializability of Schedules

Serializability is a concept used to ensure the correctness of a schedule by determining if its execution is equivalent to a serial schedule. There are two types of serializability:

1.12. Conflict Serializability

A schedule is conflict serializable if it can be transformed into a serial schedule by swapping non-conflicting operations. Two operations conflict if they belong to different transactions and at least one of them is a write operation on the same data item.

Steps to check conflict serializability:

Construct a precedence graph (or serializability graph).
Nodes represent transactions, and directed edges represent conflicts between transactions.
If the graph is acyclic, the schedule is conflict serializable.

-- Example: Transaction T1 and T2
T1: R(A), W(A)
T2: R(A), W(A)
-- Precedence Graph:
-- T1 -> T2 (because T1's W(A) conflicts with T2's R(A))
-- If no cycles, the schedule is conflict serializable.

1.13. View Serializability

A schedule is view serializable if it is view-equivalent to a serial schedule. View equivalence considers three conditions:

Initial Read: Each transaction must read the same initial value of data items as in a serial schedule.
Updated Read: Each transaction must read the same value written by another transaction as in a serial schedule.
Final Write: The final write operation of each data item must be the same as in a serial schedule.

-- Example: Transaction T1, T2, T3
T1: R(A), W(A)
T2: R(A), W(A)
T3: R(A), W(A)
-- Check View Serializability:
-- Ensure the above three conditions are met.

1.14. Comparison of Conflict and View Serializability

While both conflict and view serializability aim to ensure the correctness of schedules, they differ in their approach:

Conflict Serializability is stricter and easier to check using precedence graphs.
View Serializability is more general but harder to check as it considers a broader set of equivalent schedules.