[tahoe-dev] "servers of happiness" Re: MapReduce over Tahoe-LAFS slides from HadoopWorld
Kevan Carstensen
kevan at isnotajoke.com
Tue Oct 13 20:25:35 PDT 2009
On 10/13/09 6:23 AM Zooko Wilcox-O'Hearn <zooko at zooko.com> wrote:
> What do you think -- is this measure of "health" a good enough
> measure for the purposes of ticket #778?
I think that your intuition of health is also covered by the current
design of #778 -- that is, a mechanism that doesn't declare an upload
successful unless
1. Shares from the file went to at least h peers, and
2. Any k of the h peers are sufficient to reconstruct the file.
(I probably should have said this last night; apologies for my clumsy
wording)
My current patches check to make sure that shares were distributed to at
least h servers before declaring the upload successful. To see that
shares are not multiply distributed, and that they are not overcounted
if they already exist, see comment 52 [1] on issue #778; this explains
how the peer selection algorithm deals with these problems.
If you're prepared to believe that this check is enough to check these
criteria, you can skip the proof. I keep becoming unconvinced of this,
and I wrote the patches, so I figured it was worthwhile to include a proof.
---- Proof ----
(There are, at present, undercounting issues, but I don't think those
affect what I'm going to say)
Then a successful upload with my current patches guarantees that there
is a set of servers T with cardinality at least h such that:
1. Each server in T has received at least one share, and
2. There is an injective function between servers in T and shares of
f.
(I'm probably hideously overcomplicating this, but none of the other
ways of saying that that I could come up with did the job. Also, I'm
assuming the existence of the bijection from the discussion in #778
(which has at least convinced me of its existence). If I'm missing
something, let me know)
We can show that these guarantees are sufficient to satisfy the criteria
of health defined in #778.
Let k and n be encoding parameters. Let h be servers_of_happiness. Let X
be the set of shares of f. Suppose that k >= h. Let m : T -> X be an
injection, as described above.
Suppose that a file f has successfully been uploaded to the grid using
the implementation of #778. Then there is a set of servers T such that
|T| >= h, as described above.
Because T exists, we know that the first criterion -- that at least h
servers have stored shares of the file -- is satisfied.
Let U = {u_1, u_2, ..., u_k} be an arbitrary k-element subset of T.
Since m is an injection, there is one Y = {m(u_1), m(u_2), ..., m(u_n)},
which is a k-element set of shares. Since the shares in X are distinct
(for the purposes of file reconstruction), and the range of m is X, we
know that the shares in Y are distinct. Then Y is a k-element set of
distinct shares. Since k distinct shares are necessary to reconstruct f,
Y is sufficient to reconstruct f, as required. Then an arbitrary
k-element subset of T is sufficient to reconstruct f, as required by the
second criterion.
---- End Proof ----
So I guess I've shown (or maybe not; maybe my proof has an error) that
the patches in #778 do what they are supposed to. I think the mapping
from what they're supposed to do to your definition of health is pretty
clear: if I know that any k of a set of h servers in those that
originally received the upload are sufficient to reconstruct the file,
then I know that h is at least a lower bound on the true health value,
as required by your criteria.
Does this make sense?
I like that your measurement of health gives an actual health value, and
not just a lower bound. It'd be nice to be able to report something like
that to the user. I wonder how we'd calculate it, though. Maybe the
_servers_with_unique_shares number would correspond to it?
(apologies for the long email -- I've been meaning to try to prove that
for a while, and this seemed like a good excuse to try)
[1] http://allmydata.org/trac/tahoe/ticket/778#comment:52
--
Kevan Carstensen | <kevan at isnotajoke.com>
More information about the tahoe-dev
mailing list