Distributed Systems : Exercise set 5

1. Are these Byzantine generals forming a quorum... or what?

a) Explain the problem of the Byzantine generals, and the algorithm proposed to
solve their communication problem (as presented on the lecture slides).

b) Give an example of an equivalent (i.e. similarly solvable) distributed system
problem. (Preferrably one that does not involve war.)

c) Explain what kind of problem quorum-based protocols are designed to solve.

d) What do these two topics have in common? What are their most relevant
differences?

2. When doing backward recovery, we want to have a complete and consistent state
of the distributed system stored. Explain how you can try to achieve this goal
a) without any real-time coordination (or agreement - the nodes operate independently), or
b) in a more controlled way.

What are the costs and benefits of the different approaches? Can you think of two example
situations where one of the approaches would be better? (Argue for its relative benefits.)

3. In the story on the slides, Alice and Bob never got to their Friday lunch
restaurant meeting at La Tryste, because they could not be certain of the
commitment of the other.

Distressed, they come to ask you for help, demanding 3 different innovative ways
to fix this situation by
a) Adjusting the decision algorithm used by Bob and(/or) Alice,
b) Adding some realistic but currently missing hardware-level feature to
the unreliable email-like communication line between them,
or
c) Adding a suitable software-level acknowledgement scheme on the message
protocol.

Act as their consultant: Solve all their problems, or alternatively explain to
them why a solution for their specific needs does not exist.

4. Reliable multicasting in an ad hoc network.

For a reliable multicast service, the message-passing subsystem should guarantee
that all present (current) members of the group receive the message.

You are a consultant designing a reliable multicast service for mobile handheld
devices (smartphones or slightly more specialized devices) that will be used by
rescue teams deployed to a mountain range (to look for a lost mountain climber,
for example). In this situation, electric power and wireless bandwidth are both
limited resources.

The group is ad hoc, not stable: we do not know beforehand who belongs to the
group (just that their device will run our software), and during the rescue
operations some participants may move out of wireless range (and leave the group
momentarily).

Outline on a conceptual level how you would implement a reliable multicast
system for this purpose. The small devices will have the needed platform
software (i.e. the operating system and data communication protocols you happen
to need), your task is to identify the valid assumptions about the system, how
to keep track of current group members, how to implement routing between the
devices if they do not all hear each others' wireless transmissions directly,
and how to ensure all members of the group have received a message.

Recall the diferent alternatives of persistence and synchronicity from lecture
slide chapter 2. What kinds of alternatives would seem most appropriate here for
communication between the client software, the messaging middleware you design,
or between the operating systems? Argue why.

5. Two-phase commit under failure situations.

Reading material: 8.5.1 in T&vS 2nd edition (or 7.5.2 in the 1st edition).

Two-phase commit is a blocking protocol, i.e. participants have to wait until it
is completed.

a) What can a coordinator do if a participant (briefly) crashes and does not
reply in the voting stage? Can it detect this? What precautions should be
built into the participant in case of crashing?

b) What about after it has multicast the decision (global commit / global
abort)? What precautions should be built into the participant?

c) What can a participant do if the coordinator (briefly) crashes at the
different stages? What precautions should be taken in the coordinator in case of
crashing?

d) How is this protocol different from Alice and Bob's agreement problem?