| 1 | The NewCapDesign page describes desired features of the next filecap design. |
| 2 | This page is for designing the encoding format for these new immutable files. |
| 3 | |
| 4 | = Features = |
| 5 | |
| 6 | * as described on NewCapDesign#filecaplength, we probably need 128bit |
| 7 | confidentiality "C" bits, 256bit integrity "I" bits, and 128bit |
| 8 | storage-collision resistance. There are encoding schemes that can combine |
| 9 | the C and I bits (at the expense of convergence, or certain forms of |
| 10 | offline attenutation). |
| 11 | * we may define a "server-selection-index" (which is used to permute or |
| 12 | otherwise narrow the list of servers to be used) to be separate from the |
| 13 | "storage-index" (which is used to identify a specific share on whichever |
| 14 | servers we actually talk to). This may involve a separate field in the |
| 15 | filecap, or it may continue to be derived from the storage index. |
| 16 | * some encoding schemes allow the readcap to be attenuated to a verifycap |
| 17 | offline |
| 18 | * in general, we don't care how long the verifycap is |
| 19 | * the server should be able to validate the entire share by itself, without |
| 20 | the readcap. In general, this means that the storage-index must also be |
| 21 | the verifycap. |
| 22 | * note that this implies that the storage-index cannot be computed until |
| 23 | the end of encoding, when all shares have been generated, the share hash |
| 24 | tree has been built, and its root has been added to the UEB. |
| 25 | * this implies that we can't use the storage-index to detect convergence |
| 26 | with earlier uploads of the same file. To retain convergence may require |
| 27 | a lookup table on the server (mapping hash-of-readkey to storage-index, |
| 28 | or something) |
| 29 | * it also implies that storage-index can't be used as a |
| 30 | server-selection-index, which again points to using hash-of-readkey as |
| 31 | SSI (to retain convergence of server-selection). Setting the |
| 32 | storage-index at the end of upload requires a new uploader protocol, |
| 33 | which uses an "upload handle" for the data transfer, and finishes with a |
| 34 | "now commit this share to storage-index=X" message. |
| 35 | * the original CHK design uses hash-of-readkey as storage-index, which has |
| 36 | all these good properties except server-side full share validation. |
| 37 | (servers can compare share contents against the UEB, and we could put a |
| 38 | copy of the UEB hash into the share, but servers would continue to be |
| 39 | unable to make sure the share was in the right place) |
| 40 | |
| 41 | = Options = |
| 42 | |
| 43 | note: all cap-length computations assume the integrity-providing "I" field is |
| 44 | 256bits long, and the confidentiality-providing "C" field is 128bits long. If |
| 45 | we decide on different values, the sums below should be updated. |
| 46 | |
| 47 | == One: current CHK design == |
| 48 | |
| 49 | Readcaps consist of two main pieces: C bits and I bits, plus: |
| 50 | |
| 51 | * k (which improves the accuracy of the initial number of queries to send |
| 52 | out) |
| 53 | * N (which improves the guessed upper bound on number of queries to send |
| 54 | out, and used to be required by the abandoned TahoeThree algorithm) |
| 55 | * filesize (advisory only, used by deep-size measurements in lieu of |
| 56 | fetching share data to measure filesize) |
| 57 | |
| 58 | SI = H(C), SSI=SI. Verifycap is SI+I. |
| 59 | |
| 60 | * SSI and SI are known ahead of time, uploader protocol starts with SI |
| 61 | * good convergence |
| 62 | * long caps (128+256+len(k+N+filesize)) ~= 400bits |
| 63 | * server cannot verify entire share |
| 64 | |
| 65 | == Two: Zooko's scheme == |
| 66 | |
| 67 | Readcaps contain one crypto value that combines C and I fields. (I forget how |
| 68 | this worked.. it was clever, but I think it had some fatal flaw, like not |
| 69 | being able to get a storage-index from the readcap without first retrieving |
| 70 | shares, or something. One of us will dig up the notes on it and describe it |
| 71 | here). |
| 72 | |
| 73 | * short caps |
| 74 | * convergence problems |
| 75 | |
| 76 | == Others? == |
| 77 | |
| 78 | == Ideas == |
| 79 | |
| 80 | It might be possible to have the uploader give two values to the server, at |
| 81 | different stages of the upload process, which (together) would allow full |
| 82 | validation of the resulting share. Using a single value (the verifycap), as a |
| 83 | storage index, would be cleaner, but might not be strictly necessary. |
| 84 | |
| 85 | The servers could maintain a table, mapping from one sort of index to |
| 86 | another, if that made it easier for the upload process to proceed (or to |
| 87 | achieve convergence). For example, H(readkey) is known at the beginning of |
| 88 | upload, but the I bits aren't known until the end. If the client could use |
| 89 | SSI=H(readkey) and then ask each server to tell them the storage-index of any |
| 90 | shares which used H(readkey), it could achieve convergence and still use the |
| 91 | I bits as the storage-index. The servers would be obligated to maintain a |
| 92 | table with one entry per bucket (so probably ~20M entries), and |
| 93 | errors/malicious behavior in this table would cause convergence failures |
| 94 | (which are hardly fatal). |
| 95 | |
| 96 | The SSI can be much shorter than the SI. It only needs to be long enough to |
| 97 | provide good load-balancing properties. It could be included explicitly in |
| 98 | the filecap. Alternate (non-TahoeTwo) peer-selection strategies could encode |
| 99 | whatever per-file information they needed into the SSI, assuming some sort of |
| 100 | tradeoff between cap length (i.e. SSI length) and work done by the downloader |
| 101 | to find the right servers. |