Timestamps and validation

Sections:
1. Timestamps
2. Validation

One way to ensure serializability is through locking, which works hard toward guaranteeing that each transaction is prevented from doing anything that might violate the ACID property of isolation. Now we'll turn to two alternative approaches, which are more optimistic: They permissively allow transactions to work, assuming that they will not conflict, and cancel them only at the end if they happen to have conflicted.

1. Timestamps

We assign each transaction T a unique, ascending timestamp TS(T) when the transaction begins. In fact, our algorithm will ensure that operations are performed in an order consistent with the serial schedule in which transactions are ordered by increasing timestamps.

With each data element X, we maintain three pieces of information (in addition to the information within the element itself):

For each read operation performed by a transaction T on an element X, we consider which of the following applies:

For each write operation performed by a transaction T on an element X, we consider which of the following applies:

Finally, when a transaction T commits, we set C(X) to be true for each element that T has modified.

An enhanced version of this algorithm called multiversion timestamping reduces the possibility of aborted transactions when a transaction reads: In writing to a database element, we remember previous values of the element; then, when a transaction requests to read the element, we search for which element has the appropriate timestamp. A disadvantage of this system is that it stores significantly more data than otherwise and adds quite a bit of complexity. Nonetheless, sophisticated databases like Oracle and MySQL use multiversion timestamping (often mixed with two-phased locking).

2. Validation

Each transaction T goes through six phases:

  1. Start: The DBMS stores the current time at which the transaction is starting, which we call START(T).

  2. Execute: The transaction completes its actions. As it goes, we track its read set RS(T) and its write set WS(T) containing the elements that the transaction is reading or changing. However, each time the transaction requests to modify something on disk, the DBMS does not write the value to disk immediately: Instead, the DBMS simply stores the value that is to be written once the transaction completes.

  3. Mark validation: Once T is ready to commit, the DBMS stores the current time at which the transaction begins validation, which we call VAL(T).

  4. Validate: Then the DBMS ensures that the transaction is “legal” — that is, that the data it has accessed doesn't violate serializability. We can do this using the conditions described below. If it turns out that the transaction violates serializability, we'll cancel it; there's no harm done, since the transaction hasn't written anything to disk. This still leaves the question of how the DBMS determines that a transaction is legal; we'll study that soon.

  5. Write: The DBMS finally writes the data that was cached in step 1 onto disk.

  6. Complete: The DBMS stores the current time at which the transaction has completed, which we call FIN(T).

In validating a transaction, we'll ensure that the equivalent serial order of transactions is in the order in which they begin validation: That is, in order by VAL(T).

In validating T, one possible risk is that T has read the wrong value according to the presumed serial ordering. We can check for this by looking for any transaction U satisfying these three conditions:

Note that in fact these possibilities don't guarantee that T has actually read the wrong value for an element: It could be that U happened to write the element after T retrieved it. But there is a risk, so we would cancel the transaction.

Another risk is that some transaction earlier in the serial ordering also changed an element that T changed but that transaction has not completed. We do not want to allow this, since during the Write phase of these two transactions, the element may be written by T before U, even though T's value is the one that should be available for other transactions to read. We can check for this by looking for any transaction U satisfying these three conditions: