Timestamps and validation
Sections:
1. Timestamps
2. Validation
One way to ensure serializability is through locking, which works hard toward guaranteeing that each transaction is prevented from doing anything that might violate the ACID property of isolation. Now we'll turn to two alternative approaches, which are more optimistic: They permissively allow transactions to work, assuming that they will not conflict, and cancel them only at the end if they happen to have conflicted.
1. Timestamps
We assign each transaction T a unique, ascending timestamp TS(T) when the transaction begins. In fact, our algorithm will ensure that operations are performed in an order consistent with the serial schedule in which transactions are ordered by increasing timestamps.
With each data element X, we maintain three pieces of information (in addition to the information within the element itself):
- RT(X) is the highest timestamp of any transaction reading X.
- WT(X) is the highest timestamp of any transaction writing to X.
- C(X) is true when the most recent transaction to write to X has concluded.
For each read operation performed by a transaction T on an element X, we consider which of the following applies:
- If TS(T) ≥ WT(X), then we wait until C(X) is true and then proceed with the value found in X.
- If TS(T) < WT(X), then we abort T. After all, T is an older transaction than the one who wrote the current value into X, and so it should see a previous value of X, which is no longer available.
For each write operation performed by a transaction T on an element X, we consider which of the following applies:
- If TS(T) < WT(X), then in fact a newer transaction has already placed a value into X and the value that T wishes to write should actually not go into X. Thus, we wait until C(X) becomes true and then we proceed through T without performing any write.
- If TS(T) ≥ RT(X) (but TS(T) ≥ WT(X)), then we safely write T's value into X. We also set C(X) to false.
- If TS(T) < RT(X) (but TS(T) ≥ WT(X)), then some transaction newer than T has already read the current value of X though it should have read the value from T. Our solution is to abort T.
Finally, when a transaction T commits, we set C(X) to be true for each element that T has modified.
An enhanced version of this algorithm called multiversion timestamping reduces the possibility of aborted transactions when a transaction reads: In writing to a database element, we remember previous values of the element; then, when a transaction requests to read the element, we search for which element has the appropriate timestamp. A disadvantage of this system is that it stores significantly more data than otherwise and adds quite a bit of complexity. Nonetheless, sophisticated databases like Oracle and MySQL use multiversion timestamping (often mixed with two-phased locking).
2. Validation
Each transaction T goes through six phases:
Start: The DBMS stores the current time at which the transaction is starting, which we call START(T).
Execute: The transaction completes its actions. As it goes, we track its read set RS(T) and its write set WS(T) containing the elements that the transaction is reading or changing. However, each time the transaction requests to modify something on disk, the DBMS does not write the value to disk immediately: Instead, the DBMS simply stores the value that is to be written once the transaction completes.
Mark validation: Once T is ready to commit, the DBMS stores the current time at which the transaction begins validation, which we call VAL(T).
Validate: Then the DBMS ensures that the transaction is “legal” — that is, that the data it has accessed doesn't violate serializability. We can do this using the conditions described below. If it turns out that the transaction violates serializability, we'll cancel it; there's no harm done, since the transaction hasn't written anything to disk. This still leaves the question of how the DBMS determines that a transaction is legal; we'll study that soon.
Write: The DBMS finally writes the data that was cached in step 1 onto disk.
Complete: The DBMS stores the current time at which the transaction has completed, which we call FIN(T).
In validating a transaction, we'll ensure that the equivalent serial order of transactions is in the order in which they begin validation: That is, in order by VAL(T).
In validating T, one possible risk is that T has read the wrong value according to the presumed serial ordering. We can check for this by looking for any transaction U satisfying these three conditions:
- VAL(U) is defined: U has already validated, so it belongs before T in the serial order.
- Either FIN(U) is not yet defined or FIN(U) ≥ START(T): T starts before U completes, so that there is a risk that T might have read a value that U changed.
- An element exists in both RS(T) and WS(U), which further means that T might have read a value that U changed.
Note that in fact these possibilities don't guarantee that T has actually read the wrong value for an element: It could be that U happened to write the element after T retrieved it. But there is a risk, so we would cancel the transaction.
Another risk is that some transaction earlier in the serial ordering also changed an element that T changed but that transaction has not completed. We do not want to allow this, since during the Write phase of these two transactions, the element may be written by T before U, even though T's value is the one that should be available for other transactions to read. We can check for this by looking for any transaction U satisfying these three conditions:
- VAL(U) is defined: U has already validated, so it belongs before T in the serial order.
- Either FIN(U) is not yet defined or FIN(U) ≥ VAL(T): T enters validation before U completes, so that there is a risk that T might write an element before U writes that same element, though T's value should be the effective one.
- An element exists in both WS(T) and WS(U), which means that U might write that element after T does, though T's value should be the effective one.