Final Review B: Questions

RFb.1.

In a distributed database, a table's data may be distributed among sides by fragmenting the data horizontally or vertically. Distinguish these two fragmenting approaches.

RFb.2.

Consider the following protocol for committing a transaction in a distributed database.

  1. The coordinator sends the message “commit-request T” to all sites involved.
  2. Each site commits T, logs that T is committed, and sends back the response “committed T.”
  3. The coordinator logs that T is now committed.

What is wrong with this protocol (which leads to the introduction of the two-phase commit protocol)?

RFb.3.

Describe the problem that leads to the definition of the two-phase commit protocol.

RFb.4.

Describe briefly the two phases of the two-phase commit protocol.

RFb.5.

In the two-phase commit protocol, what must a non-coordinating site do if it recovers from a crash and sees that the last log entry for a transaction is “READY T”?

RFb.6.

As we saw, a distributed database that has just one host maintaining the lock for a data element suffers from being unable to access the element when that host is down. Describe how a system might use three hosts to maintain a lock for an element so that it can tolerate any one host being unavailable.

RFb.7.

Describe at least two features that distinguish an object-relational DBMS from a relational DBMS.

RFb.8.

Explain the concept of reference in an object-relational DBMS, and give an example illustrating how references might be used in an SQL query.

Final Review B: Solutions

RFb.1.

In horizontal fragmentation, we place different rows of a table between different sites within the distributed database. In vertical fragmentation, different sites have their own selection of columns for a table.

RFb.2.

If after receiving the “commit-request” message, any site finds that it is not in a state to commit T — or if any site has become unavailable due to network or computer failure — then this protocol would end up where some sites have committed but others have not. This would violate atomicity in a rather ugly way.

RFb.3.

Since a transaction can involve changes to data at multiple sites, all sites involved with a transaction must agree to an identical decision regarding whether the transaction should be committed: If one site cannot commit while another thinks it should, the transaction's atomicity would be violated. One site may not be able to commit due to some local failure (two possibilities are deadlock or a hardware error), and in that case all sites will need to make it so that the transaction had no effect.

RFb.4.

In the first phase, the coordinator asks all sites whether they are ready to commit the transaction, and all sites respond. In the second phase, the coordinator decides whether the transaction will abort or commit, and it sends this decision to all involved sites.

RFb.5.

It must query another site to determine the fate of the transaction. Once it determines the answer for the other site, it logs the answer into its own log.

RFb.6.

Any transaction that requires a lock of the element will only be required to obtain the lock from any two of the three systems. If any is down, then of course it will request the lock from the remaining two systems. But it will never be that two transactions manage to obtain the same exclusive lock, since each transaction requires two sites to agree, and there are only three sites.

RFb.7.

Here are three:

  1. Support for aggregate types like sets, lists, and structs.
  2. Methods can be associated with a row/object.
  3. References allow for direct references between tuples — essentially pointers, except that the tuples are actually stored on disk.
RFb.8.

A reference essentially behaves as a memory address describing a row in a table. (It does not in fact reference memory, since the database is stored on disk.) It allows one to have a table that includes direct references to rows in other tables. By contrast, in relational databases, a reference to a row in another table is stored indirectly, by listing the primary key values for that other table.

References allow a query such as the following, in which the SELECT clause includes an implicit lookup into an accounts table to identify the name associated with each account. With a relational DBMS, we would need to join the accounts and deposits tables.

SELECT acct->nameSUM(amount)
FROM   deposits
GROUP BY acct