[tahoe-dev] [tahoe-lafs] #778: "shares of happiness" is the wrong measure; "servers of happiness" is better
tahoe-lafs
trac at allmydata.org
Wed Aug 19 09:24:05 PDT 2009
#778: "shares of happiness" is the wrong measure; "servers of happiness" is
better
--------------------------------+-------------------------------------------
Reporter: zooko | Owner:
Type: defect | Status: new
Priority: critical | Milestone: undecided
Component: code-peerselection | Version: 1.4.1
Keywords: reliability | Launchpad_bug:
--------------------------------+-------------------------------------------
Comment(by swillden):
That's a good summary, kevan, but my method for calculating {{{k_e}}} and
{{{m_e}}} assumes that {{{n >= k}}} and {{{k_e >= k}}}. If {{{h < k}}},
then presumably there are situations where {{{n = k_e < k}}}.
It's kind of odd to specify that {{{k}}} is the number of servers needed
for the file to survive, but then to say that the we're happy even if
there are less than {{{k}}} servers available. It seems like if you want
to allow that, then what you really want is the "shares of happiness"
algorithm, not the "servers of happiness" algorithm.
Also, if there's only one server available, is there any sense in doing
FEC coding at all? Or, at least, is there any sense in setting {{{k_e >
1}}} or {{{m_e > 1}}}? I suppose if there were a mechanism to
redistribute the shares later it would. And Zooko's observation (in
another ticket) that {{{m_e}}} can be cheaply increased would imply that
it makes sense to set {{{k_e = m_e = k}}}, and then let a future repair
cycle increase {{{m_e}}} when more servers are available. So, perhaps the
{{{k_e, m_e}}} selection algorithm should have as a special case that
{{{k_e = m_e = k}}} when {{{n = 1}}}. That would result in no FEC
expansion when only a single server is being used, which is nice.
But what about when {{{n > 1, n < k}}}? {{{k_e = m_e = k}}} would result
in lower reliability than only using one server, which seems like a bad
idea. For {{{n=2}}}, any allocation that doesn't allow either server to
reconstruct the file is worse than using only one server. For that case,
{{{k_e = k, m_e = k_e * n}}} would make sense, which covers the {{{n=1}}}
case as well. For larger values of {{{n}}} (but still {{{n < k}}}), what
to do is less clear. Suppose the user specified {{{k = 10, h = 1, m =
10}}}, as you suggested, and there were nine servers available. {{{k_e =
k}}} is fine in that case, but {{{m_e = k_e * n}}} would result in FEC
expansion of 9!
In that case, what the user appears to be asking for, with {{{k = m}}}, is
distribution with no redundancy at all, meaning that if any of the servers
fails, the file is lost. If that's what they really want, then {{{k_e =
m_e = k}}} is fine. But what if they specified {{{k = 5, h = 1, m =
10}}}? That would indicate a desire for some redundancy, but a
willingness to accept no redundancy rather than failing the upload.
Implicit in the choices of {{{k}}} and {{{m}}} is an acceptable FEC
expansion amount and a desired redundancy level. Perhaps what makes the
most sense is to try to honor the indicated FEC expansion limit, capping
{{{m_e}}} at {{{k_e * m / k}}}.
So the rule would be, if {{{n < k}}}, then {{{k_e = k, m_e = min(k_e * n,
k_e * m / k)}}}. Of course, if {{{n < h}}}, then the upload should fail.
I'll attach a file with some code that implements this, and computes the
reliability for various options. It seems to behave reasonably for a wide
variety of inputs.
--
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/778#comment:32>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid
More information about the tahoe-dev
mailing list