Recovery

1. Recovery in DBMS

Recovery in Database Management Systems (DBMS) refers to the process of restoring the database to a correct state in the event of a failure. This ensures data integrity, consistency, and availability.

1.1 Types of Failures

Database systems may encounter various types of failures:

Transaction Failure: Occurs when a transaction cannot complete successfully due to logical errors, system crashes, or deadlocks.
System Crash: Results from hardware failures, software bugs, or power outages, leading to a loss of volatile memory.
Disk Failure: Involves physical damage to storage media, causing data loss.

1.2 Recovery Techniques

Several techniques are employed to recover from failures:

Log-Based Recovery: Utilizes a log to record all database modifications. In the event of a failure, the log can be used to redo or undo transactions.
Shadow Paging: Maintains a shadow copy of the database pages. During recovery, the shadow copy is used to restore the database to a consistent state.
Checkpointing: Periodically saves the state of the database to reduce the amount of work needed during recovery.

1.2.1 Log-Based Recovery in short

Log-based recovery is a common method used to ensure data integrity. The log contains entries for every transaction, including the following information:

Transaction ID: A unique identifier for the transaction.
Operations: Details of the operations performed (e.g., insert, update, delete).
Old Values: The data values before the operation.
New Values: The data values after the operation.

-- Creating a log table
CREATE TABLE log (
    transaction_id INT,
    operation VARCHAR(50),
    old_value VARCHAR(255),
    new_value VARCHAR(255)
);

-- Inserting a log entry
INSERT INTO log (transaction_id, operation, old_value, new_value) VALUES (1, 'UPDATE', '100', '200');

1.2.2 Shadow Paging in short

Shadow paging is another recovery method where a shadow copy of the database is maintained. The process involves:

Creating a shadow copy before any transaction begins.
Updating the shadow copy during the transaction.
Committing the shadow copy to the main database upon successful transaction completion.

If a failure occurs, the shadow copy is used to restore the database.

-- Example process of shadow paging
BEGIN TRANSACTION;
CREATE SHADOW COPY;
-- Perform operations
COMMIT TRANSACTION;

1.2.3 Checkpointing in short

Checkpointing involves periodically saving the database state, which helps in minimizing the recovery time. The process includes:

Flushing all log records to disk.
Writing a checkpoint record to the log.
Saving the current state of the database to disk.

In the event of a failure, recovery can start from the last checkpoint, reducing the amount of log data to be processed.

-- Example of checkpointing
CHECKPOINT;
-- Continue with database operations

1.3 Log-Based Recovery in Detail

Log-based recovery is a fundamental technique in DBMS to ensure the durability and atomicity of transactions. This method relies on a log file, which records all changes made to the database. The log contains sufficient information to undo or redo transactions if a failure occurs.

1.3.1 Components of Log-Based Recovery

Log-based recovery involves several key components:

Log File: A sequential file that stores log records, documenting all modifications.
Log Record: An entry in the log file containing details about a transaction's operations.
Transaction Manager: Manages the execution of transactions and interacts with the log manager to record actions.
Log Manager: Responsible for writing log records to the log file and managing log operations.

1.3.2 Structure of a Log Record

A log record typically includes the following information:

Transaction ID: Unique identifier for the transaction.
Log Sequence Number (LSN): A unique identifier for each log record, ensuring the order of records.
Type of Operation: Indicates the operation performed, such as INSERT, UPDATE, DELETE, or COMMIT.
Data Before Image (BFIM): The data value before the operation (old value).
Data After Image (AFIM): The data value after the operation (new value).
Timestamp: The time when the operation was performed.

-- Example log record
INSERT INTO log (transaction_id, lsn, operation, bfim, afim, timestamp)
VALUES (1, 1001, 'UPDATE', '100', '200', '2024-05-10 10:00:00');

1.3.3 Logging Process

When a transaction modifies the database, the following steps occur:

Before Image Recording: The old value of the data is recorded in the log before the modification.
After Image Recording: The new value of the data is recorded in the log after the modification.
Write-Ahead Logging (WAL): Ensures that log records are written to the log file before the actual data is written to the database.

1.3.4 Recovery Process

In the event of a failure, the recovery process involves two main phases: redo and undo.

1.3.4.1 Redo Phase

The redo phase re-applies all changes from committed transactions to ensure all modifications are present in the database. The steps include:

Scanning the log from the beginning or the last checkpoint.
Applying all operations to the database to redo the transactions.

-- Redo operation example
-- Applying changes from log to database
UPDATE table_name SET column_name = '200' WHERE column_name = '100';

1.3.4.2 Undo Phase

The undo phase reverses changes from uncommitted transactions to maintain consistency. The steps include:

Scanning the log backward from the point of failure.
Undoing all operations of uncommitted transactions.

-- Undo operation example
-- Reversing changes from uncommitted transactions
UPDATE table_name SET column_name = '100' WHERE column_name = '200';

1.3.5 Write-Ahead Logging (WAL) Protocol

The WAL protocol is crucial for log-based recovery, ensuring data integrity. It mandates that:

Log records must be written to the log file before the corresponding data pages are written to disk.
Transaction logs must be written before a transaction is considered committed.

This protocol ensures that no data modifications are lost, as log records provide a reliable means to redo or undo operations.

-- Example of enforcing WAL protocol
BEGIN TRANSACTION;
INSERT INTO log (transaction_id, lsn, operation, bfim, afim, timestamp) VALUES (2, 1002, 'UPDATE', '300', '400', CURRENT_TIMESTAMP);
UPDATE table_name SET column_name = '400' WHERE column_name = '300';
COMMIT TRANSACTION;

1.4 Shadow Paging in Detail

Shadow paging is a recovery technique used to ensure database consistency without the need for logs. It maintains two copies of the database: the current page table and the shadow page table. Modifications are made to the current page table, and the shadow page table remains unchanged until a transaction is committed.

1.4.1 Components of Shadow Paging

Shadow paging involves the following components:

Current Page Table: A table that points to the current pages of the database where modifications occur.
Shadow Page Table: A table that points to the original pages of the database before any modifications.
Database Pages: The actual data pages in the database.

1.4.2 Process of Shadow Paging

The process of shadow paging includes several key steps:

Initialization: At the beginning of a transaction, the shadow page table is created as a copy of the current page table.
Modification: When a transaction modifies a page, a new copy of the page is created, and the current page table is updated to point to this new page. The shadow page table remains unchanged.
Commit: Once the transaction is successfully completed, the shadow page table is updated to match the current page table, and the changes become permanent.
Abort: If the transaction fails, the current page table is discarded, and the shadow page table is used to restore the database to its original state.

1.4.3 Advantages and Disadvantages of Shadow Paging

Understanding the pros and cons of shadow paging helps in evaluating its effectiveness:

Advantages

Simplicity: Shadow paging is simple to implement as it does not require logs or complex recovery mechanisms.
No Redo or Undo: There is no need for redo or undo operations, as the shadow page table provides a consistent state.

Disadvantages

Space Overhead: Maintaining two copies of the database pages requires additional storage space.
Performance Overhead: Copying pages can be resource-intensive, leading to performance degradation for large transactions.

1.4.4 Example of Shadow Paging

Consider a scenario where a transaction updates a page in the database:

Initialize the shadow page table as a copy of the current page table.
When a page is updated, create a new copy of the page and update the current page table to point to this new page.
Commit the transaction by updating the shadow page table to match the current page table.
If the transaction fails, discard the current page table and use the shadow page table to restore the database.

-- Example process of shadow paging
BEGIN TRANSACTION;
-- Assume page 1 is updated
UPDATE current_page_table SET page_pointer = new_page_pointer WHERE page_id = 1;
-- If transaction is successful
COMMIT;
-- Update shadow page table to match current page table
UPDATE shadow_page_table SET page_pointer = new_page_pointer WHERE page_id = 1;

1.4.5 Checkpointing in Shadow Paging

Checkpointing in shadow paging involves creating a consistent snapshot of the database at regular intervals. This reduces the recovery time by providing a known good state from which to start.

Flush all dirty pages from the buffer to disk.
Update the shadow page table to reflect the current state of the database.
Write a checkpoint record to indicate the completion of the checkpoint.

-- Checkpointing example
CHECKPOINT;
-- Flush dirty pages to disk
-- Update shadow page table
UPDATE shadow_page_table SET ...;
-- Write checkpoint record
INSERT INTO checkpoint_log (timestamp) VALUES (CURRENT_TIMESTAMP);

1.4.6 Implementation in SQL

Here is an example of implementing shadow paging in SQL:

-- Create shadow page table as a copy of the current page table
CREATE TABLE shadow_page_table AS SELECT * FROM current_page_table;

-- Begin transaction
BEGIN TRANSACTION;

-- Update current page table when a page is modified
UPDATE current_page_table SET page_pointer = new_page_pointer WHERE page_id = 1;

-- Commit transaction
COMMIT;

-- Update shadow page table upon successful commit
UPDATE shadow_page_table SET page_pointer = new_page_pointer WHERE page_id = 1;

-- If transaction fails, discard current page table changes
ROLLBACK;

1.5 Checkpointing in Detail

Checkpointing is a technique used in DBMS to reduce the time required for recovery by periodically saving a consistent state of the database to disk. This helps in minimizing the amount of log data that needs to be processed during recovery, thereby speeding up the process.

1.5.1 Components of Checkpointing

Checkpointing involves the following components:

Checkpoint Record: A log record indicating the point at which the checkpoint was taken.
Dirty Pages: Pages that have been modified in memory but not yet written to disk.
Checkpoint Log: A log that keeps track of all checkpoints taken.

1.5.2 Process of Checkpointing

The process of checkpointing involves several key steps:

Flushing Dirty Pages: All modified (dirty) pages in memory are written to disk.
Writing the Checkpoint Record: A checkpoint record is written to the log, indicating the completion of the checkpoint.
Updating the Checkpoint Log: The checkpoint log is updated to include the new checkpoint.

-- Example of writing a checkpoint
CHECKPOINT;
-- Flush dirty pages to disk
-- Write checkpoint record
INSERT INTO checkpoint_log (timestamp) VALUES (CURRENT_TIMESTAMP);

1.5.3 Types of Checkpoints

There are several types of checkpoints, each with its own use case:

1.5.3.1 Consistent Checkpoints

Ensures that the database is in a consistent state. All transactions are completed, and no transactions are in progress.

1.5.3.2 Fuzzy Checkpoints

Allows some transactions to be in progress while the checkpoint is taken. Only dirty pages are flushed to disk.

1.5.3.3 Incremental Checkpoints

Only the pages modified since the last checkpoint are written to disk, reducing the amount of data written during each checkpoint.

1.5.4 Advantages and Disadvantages of Checkpointing

Understanding the benefits and drawbacks of checkpointing helps in evaluating its effectiveness:

Advantages

Reduced Recovery Time: By minimizing the amount of log data that needs to be processed during recovery, checkpointing significantly reduces recovery time.
Improved Performance: Regular checkpointing can lead to improved system performance by spreading the I/O load over time.

Disadvantages

Performance Overhead: Checkpointing can introduce performance overhead, especially if taken too frequently.
Complexity: Implementing checkpointing requires additional complexity in the DBMS.

1.5.5 Implementation in SQL

Here is an example of implementing checkpointing in SQL:

-- Creating a checkpoint log table
CREATE TABLE checkpoint_log (
    checkpoint_id INT AUTO_INCREMENT PRIMARY KEY,
    timestamp TIMESTAMP
);

-- Function to perform checkpointing
DELIMITER //
CREATE PROCEDURE perform_checkpoint()
BEGIN
    -- Flush dirty pages to disk
    FLUSH TABLES;
    
    -- Write checkpoint record
    INSERT INTO checkpoint_log (timestamp) VALUES (CURRENT_TIMESTAMP);
    
    -- Optionally, update other metadata or perform additional tasks
END //
DELIMITER ;

-- Execute checkpoint procedure
CALL perform_checkpoint();

1.5.6 Checkpointing in Recovery

During recovery, checkpoints are used to reduce the amount of log data that needs to be processed:

Identify the last checkpoint from the checkpoint log.
Start the recovery process from this checkpoint, reducing the amount of data to be read and applied.

-- Example of recovery process using checkpoint
-- Identify the last checkpoint
SELECT MAX(timestamp) INTO @last_checkpoint FROM checkpoint_log;

-- Start recovery from the last checkpoint
-- Apply all transactions from this point onwards
SELECT * FROM transaction_log WHERE timestamp >= @last_checkpoint;