[tahoe-dev] [tahoe-lafs] #654: make the storage index be the verifier cap

Sun Mar 8 17:07:41 PDT 2009

#654: make the storage index be the verifier cap
---------------------------+------------------------------------------------
 Reporter:  zooko          |           Owner:           
     Type:  enhancement    |          Status:  new      
 Priority:  major          |       Milestone:  undecided
Component:  code-encoding  |         Version:  1.3.0    
 Keywords:                 |   Launchpad_bug:           
---------------------------+------------------------------------------------

Comment(by warner):

 In general, I like the idea. I'll have to think about it some more, maybe
 my
 notes have some details of the concerns I had.

 The general categories of concerns were:

  * size of the storage index, possibly affecting size of the filecaps,
 also
    affecting the storage-server's overhead/indexing system. SIs are
 currently
    16 bytes / 128 bits (since they're derived from a 128bit AES key).
  * how many integrity bits you get out of the storage index: is it enough
 to
    replace/equal the UEB?
  * at what point of the upload/encoding process you get to learn the
 storage
    index
  * how much work the storage servers are obligated to do for each storage
    operation
  * how much information storage servers have to do local verification of
    shares

 I'd love a scheme that allowed the servers to validate their own shares
 but
 which didn't obligate them to do so right away.

 For our DSA mutable file design, if we had intermediate keys, I think we
 were
 able to make the storage index do exactly what we wanted: every bit of the
 storage index can be used to validate the key, so the server can validate
 the
 whole share (and check its signature) all by itself. This enables several
 things: write-enabler-less publishing (server accepts the write iff the
 signature is good and the seqnum is higher than the old share), local
 background Verifier passes (to detect disk errors), buddy-verification
 (servers find other servers, check each other's shares). If we can't have
 intermediate keys, I think we have a design that will still allow some
 portion of the storage index to be used for this purpose, but not the
 whole
 thing.

 (I think we could even find a way to switch to pubkey-based-SI for our
 existing RSA-based mutable files and still provide backwards
 compatibility:
 basically have the server keep a table which maps from pubkey-hash to SI,
 and
 add an API that looks for shares by pubkey-hash instead of by SI).

 For immutable files, I think the prognosis was less cheerful. The storage
 index (as it stands today) is used for two purposes: peer selection and
 share
 indexing. The random distribution of the SI (since it is derived by
 hashing
 the writekey) plus the hash-driven permutation of our peer-selection
 algorithm gives us load-balancing, or as Zandr likes to put it,
 "cryptographically strong load balancing". (note that this is balancing
 the
 inlet rate: the amount of data that is given to each server per unit
 time..
 this may not quite be what you want, since you might want your large
 servers
 to fill at a proportionally-faster rate than your smaller servers). Once
 you
 know which servers to talk to first, the SI is used to reference a
 specific
 share on those servers.

 The problem was that any integrity information we get out of an immutable
 file won't be known to us until we've finished encoding the file. So we
 can't
 use it for peer-selection, since (to avoid buffering the entire file
 locally)
 we must perform peer-selection before encoding. (we're already considering
 switching from CHK to random-keys to avoid the streaming-unfriendly
 hash-the-content-to-get-the-key-and-storageindex pass).

 Now, there are good arguments to allow alternative peer-selection schemes
 (as
 we've discussed in section 3 of source:docs/specifications/outline.txt),
 but
 there are many desireable properties to the approach we're using now. Most
 peer-selection schemes that will work for a large number of servers (where
 "work" means the downloader usually doesn't have to ask every single
 server
 whether they have a share or not, and hopefully can find enough shares in
 a
 minimum number of roundtrips) require some sort of peer-selection-index to
 be
 encoded in the filecap.

 One approach we discussed was a split index, in which the filecap has two
 index fields: one for peer-selection, and a second for share-on-peer. The
 peer-selection part could be randomly generated: it doesn't need to be
 cryptographically secure, merely long enough to give us good load-
 balancing
 properties.. 10 or 20 bits would probably be enough. (and note that it
 wouldn't necessarily have to involve a permuted list: we could pick a
 random
 10-bit starting point on the ring, then select servers in strict clockwise
 nodeid order, or something). The share-on-peer part (which is what the
 server
 thinks of as a storage index) could be determined after the encoding
 process,
 and told to the server in the final close() message that commits the
 finished
 share. This would involve a storage server protocol which has some sort of
 temporary upload handle (so subsequent messages could refer to the
 previously-uploaded partial share fragments with something other than the
 final storage index), but that's not hard to build, and might give us some
 useful resume-interrupted-upload properties too.

 Hmm, here's a scheme that might work: make the peer-selection-index be a
 hash
 of the readkey, and the share-on-peer index be the UEB hash. This would
 let
 us perform peer selection as quickly as we do now (i.e. one pass for CHK,
 or
 zero passes for random-key). Filecaps would remain the same length
 (although
 they'd need a new prefix, of course). Verify caps would be just a prefix
 plus
 the UEB hash (and k+N+size, probably).

 The earlier scheme I was thinking of (in which the filecap would need an
 extra field for the peer-selection index) had some downsides, but I don't
 yet
 see any in this scheme. The only one I can think of it that it would
 obligate
 us to have some portion of the filecap (the readkey) which can't be used
 for
 integrity checking (since it needs to be generated before we've encoded
 the
 file), but we're already obligated to have that (the readkey is needed to
 encode the file in the first place).

 Well, and we lose a little bit of convergence: if I upload the same file
 (using CHK) as someone in my convergence domain already uploaded before,
 they'll wind up with the same filecap (and therefore peer-selection-index
 and
 storage-index) as before, but I won't be able to learn of that fact (and
 thus
 avoid doing the duplicate upload) until the end of encoding, when the
 UEBhash/storage-index is generated. I guess this means we should record a
 full-cryptographic-length peer-selection-index with each share, maintain a
 table that maps from peer-selection-index to UEBhash/storage-index, and
 have
 the servers do a lookup at allocate() time. If allocate() tells us that
 they
 already have a share for that peer-selection-index, we can ask it for the
 UEBhash, then download enough data to compare the file we're thinking
 about
 uploading against the shares that are claimed, and see if the upload can
 be
 skipped. Hm, it might also work to use this peer-selection-index as the
 "upload-id", for resuming an upload later.

 So, a summary of how we could implement this for immutable files:

  * when encoding/uploading:
   * construct the readkey (whether by CHK or randomly, doesn't matter)
   * hash the readkey to get the "peer-selection-index", use it to find
     storage servers who will accept a share of the necessary size (but
 don't
     tell the servers the final storage index yet)
     * servers might respond to the allocate(peer-selection-index) request
       with news that they already have shares for that index, and will
       return the UEBhash/storage-index for those shares
   * encrypt/encode the file, writing shares to servers, building hash
 trees
   * compute/write the UEB and its hash
   * tell storage servers to commit the finished share to a storage-index
     equal to the UEB hash
   * record readkey+UEBhash into the filecap as usual
  * when downloading:
   * hash the readkey to get the peer-selection index, build the permuted
 peer
     list
   * ask servers in the permuted list if they have a share with storage-
 index
     == UEBhash
   * download/decode shares as usual
  * new interfaces/formats we'd need:
   * new URI:CHK2: filecap format, to indicate that downloaders should use
     hash(readkey) as peer-selection-index and UEB hash as storage-index
     (instead of using hash(readkey) for both)
   * new server upload API:
    * "allocate" takes an peer-selection-index/upload-id rather than a
      storage-index, maintains a table that maps from that to storage-
 index.
    * writebucket.close takes a storage-index argument to tell it where to
      file the finished share
    * maybe some sort of resume-upload method which takes a
      peer-selection-index/upload-id
    * new server-side share version number, to tell the server that the
      storage index (i.e. bucketdir name) should be the same as the
 embedded
      UEB hash. Older shares cannot be validated this way (the contents can
 be
      validated against the embedded UEB hash, but that hash cannot be
 checked
      against anything).

-- 
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/654#comment:1>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid