[tahoe-dev] [tahoe-lafs] #654: make the storage index be the verifier cap
tahoe-lafs
trac at allmydata.org
Sun Mar 8 17:07:41 PDT 2009
#654: make the storage index be the verifier cap
---------------------------+------------------------------------------------
Reporter: zooko | Owner:
Type: enhancement | Status: new
Priority: major | Milestone: undecided
Component: code-encoding | Version: 1.3.0
Keywords: | Launchpad_bug:
---------------------------+------------------------------------------------
Comment(by warner):
In general, I like the idea. I'll have to think about it some more, maybe
my
notes have some details of the concerns I had.
The general categories of concerns were:
* size of the storage index, possibly affecting size of the filecaps,
also
affecting the storage-server's overhead/indexing system. SIs are
currently
16 bytes / 128 bits (since they're derived from a 128bit AES key).
* how many integrity bits you get out of the storage index: is it enough
to
replace/equal the UEB?
* at what point of the upload/encoding process you get to learn the
storage
index
* how much work the storage servers are obligated to do for each storage
operation
* how much information storage servers have to do local verification of
shares
I'd love a scheme that allowed the servers to validate their own shares
but
which didn't obligate them to do so right away.
For our DSA mutable file design, if we had intermediate keys, I think we
were
able to make the storage index do exactly what we wanted: every bit of the
storage index can be used to validate the key, so the server can validate
the
whole share (and check its signature) all by itself. This enables several
things: write-enabler-less publishing (server accepts the write iff the
signature is good and the seqnum is higher than the old share), local
background Verifier passes (to detect disk errors), buddy-verification
(servers find other servers, check each other's shares). If we can't have
intermediate keys, I think we have a design that will still allow some
portion of the storage index to be used for this purpose, but not the
whole
thing.
(I think we could even find a way to switch to pubkey-based-SI for our
existing RSA-based mutable files and still provide backwards
compatibility:
basically have the server keep a table which maps from pubkey-hash to SI,
and
add an API that looks for shares by pubkey-hash instead of by SI).
For immutable files, I think the prognosis was less cheerful. The storage
index (as it stands today) is used for two purposes: peer selection and
share
indexing. The random distribution of the SI (since it is derived by
hashing
the writekey) plus the hash-driven permutation of our peer-selection
algorithm gives us load-balancing, or as Zandr likes to put it,
"cryptographically strong load balancing". (note that this is balancing
the
inlet rate: the amount of data that is given to each server per unit
time..
this may not quite be what you want, since you might want your large
servers
to fill at a proportionally-faster rate than your smaller servers). Once
you
know which servers to talk to first, the SI is used to reference a
specific
share on those servers.
The problem was that any integrity information we get out of an immutable
file won't be known to us until we've finished encoding the file. So we
can't
use it for peer-selection, since (to avoid buffering the entire file
locally)
we must perform peer-selection before encoding. (we're already considering
switching from CHK to random-keys to avoid the streaming-unfriendly
hash-the-content-to-get-the-key-and-storageindex pass).
Now, there are good arguments to allow alternative peer-selection schemes
(as
we've discussed in section 3 of source:docs/specifications/outline.txt),
but
there are many desireable properties to the approach we're using now. Most
peer-selection schemes that will work for a large number of servers (where
"work" means the downloader usually doesn't have to ask every single
server
whether they have a share or not, and hopefully can find enough shares in
a
minimum number of roundtrips) require some sort of peer-selection-index to
be
encoded in the filecap.
One approach we discussed was a split index, in which the filecap has two
index fields: one for peer-selection, and a second for share-on-peer. The
peer-selection part could be randomly generated: it doesn't need to be
cryptographically secure, merely long enough to give us good load-
balancing
properties.. 10 or 20 bits would probably be enough. (and note that it
wouldn't necessarily have to involve a permuted list: we could pick a
random
10-bit starting point on the ring, then select servers in strict clockwise
nodeid order, or something). The share-on-peer part (which is what the
server
thinks of as a storage index) could be determined after the encoding
process,
and told to the server in the final close() message that commits the
finished
share. This would involve a storage server protocol which has some sort of
temporary upload handle (so subsequent messages could refer to the
previously-uploaded partial share fragments with something other than the
final storage index), but that's not hard to build, and might give us some
useful resume-interrupted-upload properties too.
Hmm, here's a scheme that might work: make the peer-selection-index be a
hash
of the readkey, and the share-on-peer index be the UEB hash. This would
let
us perform peer selection as quickly as we do now (i.e. one pass for CHK,
or
zero passes for random-key). Filecaps would remain the same length
(although
they'd need a new prefix, of course). Verify caps would be just a prefix
plus
the UEB hash (and k+N+size, probably).
The earlier scheme I was thinking of (in which the filecap would need an
extra field for the peer-selection index) had some downsides, but I don't
yet
see any in this scheme. The only one I can think of it that it would
obligate
us to have some portion of the filecap (the readkey) which can't be used
for
integrity checking (since it needs to be generated before we've encoded
the
file), but we're already obligated to have that (the readkey is needed to
encode the file in the first place).
Well, and we lose a little bit of convergence: if I upload the same file
(using CHK) as someone in my convergence domain already uploaded before,
they'll wind up with the same filecap (and therefore peer-selection-index
and
storage-index) as before, but I won't be able to learn of that fact (and
thus
avoid doing the duplicate upload) until the end of encoding, when the
UEBhash/storage-index is generated. I guess this means we should record a
full-cryptographic-length peer-selection-index with each share, maintain a
table that maps from peer-selection-index to UEBhash/storage-index, and
have
the servers do a lookup at allocate() time. If allocate() tells us that
they
already have a share for that peer-selection-index, we can ask it for the
UEBhash, then download enough data to compare the file we're thinking
about
uploading against the shares that are claimed, and see if the upload can
be
skipped. Hm, it might also work to use this peer-selection-index as the
"upload-id", for resuming an upload later.
So, a summary of how we could implement this for immutable files:
* when encoding/uploading:
* construct the readkey (whether by CHK or randomly, doesn't matter)
* hash the readkey to get the "peer-selection-index", use it to find
storage servers who will accept a share of the necessary size (but
don't
tell the servers the final storage index yet)
* servers might respond to the allocate(peer-selection-index) request
with news that they already have shares for that index, and will
return the UEBhash/storage-index for those shares
* encrypt/encode the file, writing shares to servers, building hash
trees
* compute/write the UEB and its hash
* tell storage servers to commit the finished share to a storage-index
equal to the UEB hash
* record readkey+UEBhash into the filecap as usual
* when downloading:
* hash the readkey to get the peer-selection index, build the permuted
peer
list
* ask servers in the permuted list if they have a share with storage-
index
== UEBhash
* download/decode shares as usual
* new interfaces/formats we'd need:
* new URI:CHK2: filecap format, to indicate that downloaders should use
hash(readkey) as peer-selection-index and UEB hash as storage-index
(instead of using hash(readkey) for both)
* new server upload API:
* "allocate" takes an peer-selection-index/upload-id rather than a
storage-index, maintains a table that maps from that to storage-
index.
* writebucket.close takes a storage-index argument to tell it where to
file the finished share
* maybe some sort of resume-upload method which takes a
peer-selection-index/upload-id
* new server-side share version number, to tell the server that the
storage index (i.e. bucketdir name) should be the same as the
embedded
UEB hash. Older shares cannot be validated this way (the contents can
be
validated against the embedded UEB hash, but that hash cannot be
checked
against anything).
--
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/654#comment:1>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid
More information about the tahoe-dev
mailing list