September 26, 2022
Shadow Paging is a renowned database recovery technique, an essential safeguard in the field of data management, ensuring the stability, integrity, and recovery of databases in the face of system failures or crashes. It eliminates the need for logging operations and allows for atomic transactions, making it a preferred choice for many data-heavy applications.
Current pages are the ones used for read operations - these pages hold the present data in the database. On the other hand, shadow pages are set aside for write operations - these are where we make changes during a transaction.
Once a write operation is deemed successful, the database system needs to update a specific page, called the "pointer page." This page essentially keeps track of where each logical page is located on the physical disk (a 'physical:logical' page map). Alternatively, the pointer page can also direct the system to a separate index that maintains this map. In other cases, especially in tree-based structures like B-trees or B+ trees, the pointer page could be the root page of such trees.
Once the pointer page is updated, it "swaps" the roles of current and shadow pages. The shadow pages, which contain the new changes, become the current pages for subsequent read operations, and the old current pages can be designated as shadow pages for future transactions.
In summary, shadow paging involves making changes to shadow pages while preserving the original data on current pages, ensuring that a stable version of the database can be restored if necessary. The pointer page plays a crucial role in this process, directing the system to the correct version of each logical page at any given time.
Atomic Transactions: Shadow paging is inherently atomic, which means it executes each transaction in its entirety or not at all. This eliminates partial transactions, which can potentially corrupt the database, providing an additional layer of security.
Immediate Durability: Shadow paging offers immediate durability as it writes changes directly to disk. This removes the need for redo and undo logs, streamlining the recovery process.
No Log Overhead: Since changes are immediately written to the database, shadow paging eliminates the need for maintaining logs, reducing storage overhead and simplifying the recovery process.
Disk I/O Overhead: The most significant disadvantage of shadow paging is the considerable disk I/O overhead. For every transaction, two writes are necessary—one for the updated data block and one for the page directory. This overhead increases with the database's size.
Garbage Collection: Shadow paging creates multiple versions of data and directories, which can take up significant disk space if not managed properly. A robust garbage collection mechanism is needed to ensure older, irrelevant shadows are discarded.
Concurrent Access Issues: Shadow paging struggles with high concurrency, as it doesn't handle simultaneous transactions well. Synchronizing multiple transactions requires careful handling to avoid data inconsistencies.