Changes between Initial Version and Version 1 of NewImmutableEncodingDesign


Ignore:
Timestamp:
2009-08-25T09:59:45Z (15 years ago)
Author:
warner
Comment:

new page about immutable encoding design

Legend:

Unmodified
Added
Removed
Modified
  • NewImmutableEncodingDesign

    v1 v1  
     1The NewCapDesign page describes desired features of the next filecap design.
     2This page is for designing the encoding format for these new immutable files.
     3
     4= Features =
     5
     6 * as described on NewCapDesign#filecaplength, we probably need 128bit
     7   confidentiality "C" bits, 256bit integrity "I" bits, and 128bit
     8   storage-collision resistance. There are encoding schemes that can combine
     9   the C and I bits (at the expense of convergence, or certain forms of
     10   offline attenutation).
     11 * we may define a "server-selection-index" (which is used to permute or
     12   otherwise narrow the list of servers to be used) to be separate from the
     13   "storage-index" (which is used to identify a specific share on whichever
     14   servers we actually talk to). This may involve a separate field in the
     15   filecap, or it may continue to be derived from the storage index.
     16 * some encoding schemes allow the readcap to be attenuated to a verifycap
     17   offline
     18 * in general, we don't care how long the verifycap is
     19 * the server should be able to validate the entire share by itself, without
     20   the readcap. In general, this means that the storage-index must also be
     21   the verifycap.
     22  * note that this implies that the storage-index cannot be computed until
     23    the end of encoding, when all shares have been generated, the share hash
     24    tree has been built, and its root has been added to the UEB.
     25   * this implies that we can't use the storage-index to detect convergence
     26     with earlier uploads of the same file. To retain convergence may require
     27     a lookup table on the server (mapping hash-of-readkey to storage-index,
     28     or something)
     29   * it also implies that storage-index can't be used as a
     30     server-selection-index, which again points to using hash-of-readkey as
     31     SSI (to retain convergence of server-selection). Setting the
     32     storage-index at the end of upload requires a new uploader protocol,
     33     which uses an "upload handle" for the data transfer, and finishes with a
     34     "now commit this share to storage-index=X" message.
     35   * the original CHK design uses hash-of-readkey as storage-index, which has
     36     all these good properties except server-side full share validation.
     37     (servers can compare share contents against the UEB, and we could put a
     38     copy of the UEB hash into the share, but servers would continue to be
     39     unable to make sure the share was in the right place)
     40
     41= Options =
     42
     43note: all cap-length computations assume the integrity-providing "I" field is
     44256bits long, and the confidentiality-providing "C" field is 128bits long. If
     45we decide on different values, the sums below should be updated.
     46
     47== One: current CHK design ==
     48
     49Readcaps consist of two main pieces: C bits and I bits, plus:
     50
     51 * k (which improves the accuracy of the initial number of queries to send
     52   out)
     53 * N (which improves the guessed upper bound on number of queries to send
     54   out, and used to be required by the abandoned TahoeThree algorithm)
     55 * filesize (advisory only, used by deep-size measurements in lieu of
     56   fetching share data to measure filesize)
     57
     58SI = H(C), SSI=SI. Verifycap is SI+I.
     59
     60 * SSI and SI are known ahead of time, uploader protocol starts with SI
     61 * good convergence
     62 * long caps (128+256+len(k+N+filesize)) ~= 400bits
     63 * server cannot verify entire share
     64
     65== Two: Zooko's scheme ==
     66
     67Readcaps contain one crypto value that combines C and I fields. (I forget how
     68this worked.. it was clever, but I think it had some fatal flaw, like not
     69being able to get a storage-index from the readcap without first retrieving
     70shares, or something. One of us will dig up the notes on it and describe it
     71here).
     72
     73 * short caps
     74 * convergence problems
     75
     76== Others? ==
     77
     78== Ideas ==
     79
     80It might be possible to have the uploader give two values to the server, at
     81different stages of the upload process, which (together) would allow full
     82validation of the resulting share. Using a single value (the verifycap), as a
     83storage index, would be cleaner, but might not be strictly necessary.
     84
     85The servers could maintain a table, mapping from one sort of index to
     86another, if that made it easier for the upload process to proceed (or to
     87achieve convergence). For example, H(readkey) is known at the beginning of
     88upload, but the I bits aren't known until the end. If the client could use
     89SSI=H(readkey) and then ask each server to tell them the storage-index of any
     90shares which used H(readkey), it could achieve convergence and still use the
     91I bits as the storage-index. The servers would be obligated to maintain a
     92table with one entry per bucket (so probably ~20M entries), and
     93errors/malicious behavior in this table would cause convergence failures
     94(which are hardly fatal).
     95
     96The SSI can be much shorter than the SI. It only needs to be long enough to
     97provide good load-balancing properties. It could be included explicitly in
     98the filecap. Alternate (non-TahoeTwo) peer-selection strategies could encode
     99whatever per-file information they needed into the SSI, assuming some sort of
     100tradeoff between cap length (i.e. SSI length) and work done by the downloader
     101to find the right servers.