Distributed Systems : Workshop 7
Form groups of 2-4 people.
1. Quorums are somewhat elaborate and there were good questions about what the
whole point of them is on the lecture, so let's look at them a bit more closely
here.
In a Mobile wireless Ad hoc NETwork (also known as MANET for short), some
group state information is replicated to all nodes. Due to wireless range and
possibly limited power capacity in the nodes (want to save power), not all nodes
are continuously reachable.
In order to maintain the consistency of the state information, a protocol based
on quorum consensus is used.
Discuss the following points in the group:
a) Explain the basic idea of a quorum-based protocol in the group.
When desigining a quorum-based protocol, the sizes of read and write quoras
must be set. (These remain constant when the protocol is used.)
b) What would be a sensible size for a read and write quorum in this case?
If a quorum cannot be found, from the system design point of view, would you
allow any read and/or write operations to take place?
In order to collect a quorum, the initiating node must first contact a
sufficient number of nodes that will "agree with it" on the operation it wants
to do.
c) Since this is an ad hoc network and may be partitioned, there is no clear
reliable multicast service around. Consider the benefits and disadvantages of
using an epidemic protocol for collecting a quorum here. How would you apply it
in this case?
d) We noted on the lecture slides that if there are insufficient up-to-date replicas for a
write quorum, they need to be replaced with replicas whose data is
up-to-date. Why is this important?
(An alternative to selecting different nodes is to trigger the non-current nodes
to get themselves up-to-date and wait for that to finish.)
After collecting a quorum, the node performs the actual I/O operation: for read,
it can connect to the nearest up-to-date neighobur, for write the quorum members
must lock their data until they have all received the new data. The changes
should be propagated to the other nodes as well, but this operation is completed
and the locks can be released once the quorum members are updated.
e) Sometimes multiple information updates may be going on in parallel. How
should a node currently participating in a quorum operation react to another
quorum request coming in? What should happen on the system level? (There are
different options.)
f) If the group changes between the quorum and the actual update operation
(i.e. someone goes missing), what problems can this cause? Is there anything we
can do about them for read or write operations specifically?
2. There is classification of different kinds of failures in the book
(ch. 8.1.2) and lecture slides (slide chapter 5). Explain their differences and
discuss what kinds of handling mechanisms would be suitable for each type.
3. Discuss how you would ensure that a distributed transaction is atomic and
that transactions happening in parallel can be serialized.