[tahoe-dev] tahoe-dev Digest, Vol 46, Issue 20
Josh Wilcox
wilcoxjg at gmail.com
Wed Jan 12 22:22:37 UTC 2011
> Message: 6
> Date: Wed, 12 Jan 2011 00:40:21 -0700
> From: "Zooko O'Whielacronx" <zooko at zooko.com>
> To: Tahoe-LAFS development <tahoe-dev at tahoe-lafs.org>
> Subject: [tahoe-dev] default values of K, H, N (was: I assumed each
> share would go to a different server...)
> Message-ID:
> <AANLkTikEiOMwbfHW=WufvjjPXxqquo9J5REBq=_gG8k9 at mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> Dear Josh, Shawn, Greg, et al.:
>
> While it warms my heart to see people teaching each other, it's not
> "scalable" for new users to be surprised when the behavior doesn't
> match their assumptions, then post on the mailing list and get an
> explanation about why the actual behavior differs from their
> assumptions.
>
> I think we should either change the default behavior to match the
> common user expectations, or else add documentation to, if possible,
> explain the surprising thing for them when they begin trying to use
> it.
>
> Note that another user, more than one year ago, reported the same
> confusion:
>
> http://tahoe-lafs.org/pipermail/tahoe-dev/2009-August/002494.html
>
> Which is why I created ticket #778 (servers of happiness).
>
> There are some reasons (mostly to do with performance and
> availability) why someone might want N > H, but the newbies seem to
> expect H == N. Perhaps we should set H == N in the defaults and then
> let more sophisticated users tune the (K, H, N) for their particular
> grid and their preferences?
>
> Speaking generally, I think there are at least three different
> desiderata that we could have for our default settings, and we
> probably can't have all of what we want:
>
> 1. (Safety) Users who entrust valuable data to it without changing the
> defaults won't lose integrity, confidentiality, or data-preservation.
>
> 2. (Unsurprisingness) Users will rarely be surprised by the default
> behavior.
>
> 3. (Performance and Features) Users will get good transfer speeds, the
> ability to migrate or rebalance files without having to re-encode
> them, better storage efficiency, higher fault-tolerance, etc.
>
> I would really like to prioritize them in this order.
>
> (Hm, in a sense Unsurprisingness is really the essence of Safety.
> Regardless of what the settings are, if the user understands the
> consequences of those settings then they won't be harmed.)
>
> I don't think default settings are a good way to accomplish
> desideratum 3 very well because the settings probably have to be tuned
> to the particular grid. Fran?ois has a grid with three physical
> machines and 60- or 70- odd storage server processes. I have a grid (I
> just set it up!) with eight storage server processes on a single
> Amazon EC2 virtual machine. The volunteergrid1 has 17 physical servers
> of hetergeneous size and performance, each one operated by a different
> volunteer. In the future, more people might set up their own "personal
> Tahoe-LAFS grid" consisting of only a single storage server owned by
> them. There are no default settings that are optimal for all of these
> cases.
>
> Documentation is probably the best way to accomplish desideratum 3.
> (Our documentation is already better than most open source projects,
> but it could also has lots of room for improvement. Volunteers
> needed!)
>
> So I would favor some default settings like (1, 1, 1) or (1, 3, 3) or
> (3, 10, 10), because those seem to score higher on Unsurprisingness in
> my book.
>
> Honestly at the moment I think I favor (1, 1, 1). It works on any grid
> (even "the 1-server grid", which I imagine might turn out to be a
> valuable use case), the safety qualities should be obvious to any
> user, and it arranges for users to learn about the confidentiality and
> integrity properties first, and then separately to learn about the
> consequences of erasure-coding.
>
> Thoughts?
>
> Regards,
>
> Zooko
>
> http://tahoe-lafs.org/trac/tahoe-lafs/ticket/778# "shares of
> happiness" is the wrong measure; "servers of happiness" is better
>
For a default how about H = N?
I notice that in:
http://tahoe-lafs.org/pipermail/tahoe-dev/2009-August/002494.html
the user only used a one server grid in a an attempt to create an error,
which would have happened had H = N been true in his case.
Isn't having H < N, a setup that trades off reliability for performance?
If so, then this is trading desideratum (1) (or some component thereof)
for desideratum (3).
Seems like H = N meets (1) and (2) at a possible cost to (3).
Given the way you've ordered your values this choice seems good.
--J
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20110112/ae646a4f/attachment.html>
More information about the tahoe-dev
mailing list