[tahoe-dev] #467 static-server-selection UI (was: web "control panel", static server selection UI)

Brian Warner warner at lothar.com
Mon Jan 24 14:57:13 UTC 2011


On 1/24/11 1:49 PM, Shawn Willden wrote:
> On Mon, Jan 24, 2011 at 2:23 PM, Greg Troxel <gdt at ir.bbn.com
> <mailto:gdt at ir.bbn.com>> wrote:
> 
GT>     A key property is that with some churn things still work. Once
GT>     you add 'required', that breaks.
> 
SW> I don't see a problem with that at all. If you don't want things to
SW> break because you're requiring servers, then don't use the
SW> "required" checkbox. But I can see value in having the option of
SW> specifying that you'd rather have uploads fail if your desired set
SW> of servers isn't available.

Hmm, I see what Greg means: there may be a core set of 6 servers, and
you might be ok with uploads that involve at least 5 of them, but with a
single "allowed?" boolean per server, there's no way to express that.

The "right" way to address this, generically, is probably going to be a
plugin system with a python function that gets a currently-connected
server list and emits a "ok for upload" boolean, then a tahoe.cfg or
WUI-control-panel knob to choose which function/mode you want to use,
and some flexible config-tool space that the plugin gets to control to
let you tell the plugin how to work (i.e. give it a server list, or a
count of required servers, etc).

I kind of think we need to build and get confortable with a couple of
different single-purpose tools before we can build a usable generic
tool. So I'd like to start by identifying our most important use cases
and build some tools that accomodate those.

So far, we've concentrated on two main use cases:

 allmydata.com: hundreds of servers, centrally managed, almost always
                online, commercial/financial relationship between users,
                zero user configuration
 friendnet: dozens of servers, individually managed, mostly online,
            social ties between users, manual local configuration

We also know that we'd love to support the "One True Grid" use case
(millions of servers, freenet-style, low average availability), but we
don't know how to make that work yet. I'm trying to figure out how a
"Storage Club" -style grid would work, in which existing grid members
can invite new members to join, one at a time, so there's a web-of-trust
-shaped social relationship between members. And, at least personally,
I'm less focussed on the allmydata.com use case since the company went
under :).

So, for the friendnet -style grids we have right now, what would be
easiest to use? When you join the volunteergrid, what conditions would
make you wish an upload would loudly fail instead of appearing to
succeed?

GT> For 'allowed', I would want to phrase it as "disable use of this
GT> server" because the default in a grid is to use all. It's not clear
GT> what the motivation for disabling is - performance, cost, or
GT> reliability.

Hm, that's a good point: how could we distill all the reasons that you
might want to control server selection (i.e. pretty much all the text
from ticket #467) into a single column header? Some potential
motivations:

 * performance: uploading data to a server which isn't going to be
   around later is wasted upload bandwidth: your upload would run faster
   and you'd get the same results if you just deleted the share without
   uploading it
 * cost: if you're paying for storage (or bartering local space for
   remote storage, or whatever Accounting enables), then you shouldn't
   spend money/etc for unreliable service, that's a waste of money
 * reliability: if you're only going to upload 10 shares, putting 3 of
   them on unreliable servers is a slower/more-expensive way to upload 7
   shares to good servers

Our assumption is that server reliability (and the user's opinion of it)
is a fuzzy concept that the Tahoe client node cannot figure out on its
own. It depends upon all sorts of complicated human things like whether
the server operator is a good admin or a lazy one, a friend of yours or
a vendor/customer, whether they'll hold shares as a favor for you, or in
exchange for money, or if they'll delete the shares at the first sign of
the disk getting full. So I think we need a way for the user to explain
what they want to the client node, delegating a lot of the
reliability-prediction work to a human.

The original idea of default-disallow (i.e. whitelist) came from the
allmydata world, where we were worried about customers accidentally
enabling storage-service on their supposedly-client-only home nodes.
This happened to a handful of power users (who built their own tahoe
clients and connected them to the production grid without disabling
[storage]enabled= ), and was surprising both for the unexpected new
server "operator" (whose downstream links were jammed with the entire
allmydata.com customer base throwing shares at them) and for the other
clients (who had a few shares stored on this random fellow customer's
laptop, instead of the nice, stable, always-on, public-IP-reachable
official server). While the original mojonation and allmydata plans
called for customers-providing-storage, we backed down to
only-the-company-provides-storage as part of the "let's make it actually
work" simplification.

We wanted an easy way to make sure customers *only* used official
allmydata.com servers, not accidental rogues. We had a couple of
different plans, none of which really got implemented:

 * split introducer: separate introducer.furls for publishing and
   subscribing, only give the subscription FURL to clients, keeping the
   ability to advertise storage servers for ourselves
 * signed announcements, verified subscriptions: clients would have a
   preconfigured pubkey, and would only pay attention to announcements
   signed by it, we'd keep the privkey secret and only sign
   announcements for official servers. Or have an intermediate node, and
   use two-level certchains.

For a friend-net, there's less utility to disabling uploads to specific
servers (i.e. blacklist), unless you somehow know that you'll get crummy
service from a small number of them. I'd definitely put more energy into
the whitelist (default=disallow, set "allow" on one server at a time)
than the blacklist (default=allow, set "disallow" on one server at a
time).

GT>     I wonder about a priority # per server, to bias the selection
GT>     rather than being absolute.

Now, that's an interesting issue. I think I've written about the
diversity issues of non-uniform distribution choices before (probably a
year ago), in the context of filling servers evenly by percentage rather
than by bits-per-second. Preferring certain servers reduces the entropy
of server-selection, increasing correlation between otherwise separate
files, possibly increasing the chances that a given set of server
failures will damage more files.

If we were going to go this way, I'd pursue our original old idea
("Tahoe 1", abandoned before the first release, I think) of assigning
"reliability points" to each server, and then keep allocating new shares
(and increasing N) until the number of points adds up to above some
threshold. This would still put some shares on low-reliability servers,
but would never end up relying upon them as much as the high-reliability
ones.

GT>     If I check required, then I have no resilience to a server being
GT>     unreachable once in a while. If I had 6 servers and each were
GT>     there 99% of the time I might not care which 5 of the 6 got
GT>     shares.
SW> So don't check "required". Instead, uncheck "allowed" on all but the
SW> six servers. Or if there are only six in the grid, just leave it
SW> alone and the defaults will do what you want.

Yeah, I see what you both mean. There are cases where "this server
*must* be present to allow uploads" is sufficient: my personal
backupgrid (with its mere three storage nodes) falls into this category.
I certainly want it to try downloads with as few as one server, but I
expect that all three servers should be present all the time, and want
to be notified (in the form of 'tahoe backup' uploads failing loudly)
when that's not the case.

On the volunteergrid, having 12 servers online (out of maybe 14 total)
ought to be enough for me to feel safe about uploads. Or, having at
least 12 servers that will actually accept my shares; if half of the
connected servers are full, they don't count. But that's where
servers/shares-of-happiness starts to be a better way to express the
reliability goal: I don't care about which specific servers are being
counted, but I want to have enough of them that I get some adequate
diversity, and the H= metric actually measures diversity.

So I kind of suspect that Shawn's suggestion will work: set H=5 and let
that metric protect you from putting shares on fewer than 5 servers, and
use allowed=False on ones outside the expected 6 servers so you won't
put 5 shares on random unknown interlopers.


Good discussion! Keep it up!
 -Brian


More information about the tahoe-dev mailing list