This page is for notes and design considerations for the next version of tahoe's mutable files. NewCapDesign includes a lot of desired features, this page is about the backend layout that would make those features possible. * #217 contains a lot of the original notes. * #492 adds a ciphertext hash tree * #794 is about creating writecaps from passphrases * #795 is about append-only filecaps, #796 is closely related * #308 is about deep-traversal dircaps, which will require support from the underlying mutable files * [http://allmydata.org/pipermail/tahoe-dev/2009-July/002345.html this tahoe-dev message] and its neighbors have some good discussion about cap design for mutable files * [http://allmydata.org/pipermail/tahoe-dev/2009-September/002848.html The Elk Point design] is very interesting, and has not yet been transcribed into this wiki page. = Yay ECDSA = Once we have ECDSA (#331), we'll have a general-purpose signing primitive with short fields (K=kappa bits for the signing key, 2K for the verifying key, and 4K for the signature, with an expected K=128bits, so 16-byte signing keys, 32-byte verifying keys, and 64-byte signatures). Our current RSA-based signatures use 1216-*byte* signing keys, 292-byte verifying keys, and 256-byte signatures. The RSA fields are so large that we clearly cannot put them in the filecaps, so the encoding scheme requires them to be stored in the shares, encrypted and hashed as necessary. The DSA keys are short enough (in most cases) to put directly in the filecap, simplifying the design considerably. = Desired Features = * fewer roundtrips: mutable retrieve must currently fetch the pubkey (or encrypted privkey) from at least one server, which complicates the state machine and can add roundtrips when we guess incorrectly about a good amount of data to fetch on the initial read * offline attenuation: it should be possible to attenuate/diminish the writecap into the readcap without retrieving any shares, otherwise operations like adding a child dircap to a parent directory will be much slower (a roundtrip per child). * writecap -> readcap -> deep-traversal-cap -> verifycap . This requires some form of intermediate key scheme. * server-side share validation * one option is to get rid of the "write-enabler" shared secret and rely upon server validation exclusively. This would make share migration easier and remove one need for an encrypted channel (lease secrets would continue to need protection unless/until they too are replaced with signature verification). However, it would also increase server load. == Filecap Length == A likely security parameter K (=kappa) would be 96 or 128 bits, and most of the filecaps will be some multiple of K. Assuming a {{{tahoe:}}} prefix and no additional metadata, here's what various lengths of base62-encoded filecaps would look like: * 1*K: * 96 {{{tahoe:14efs6T5YNim0vDVV}}} * 128 {{{tahoe:4V2uIYVX0PcHu9fQrJ3GSH}}} * 2*K: * 192 {{{tahoe:072Og6e75IOP9ZZsbR1Twjs6X5xXJnBAF}}} * 256 {{{tahoe:fZeioazoWrO62reiAjzUAyV0uz3ssh6Hnanv8cKMClY}}} * 3*K: * 288 {{{tahoe:11DriaxD9nipA10ueBvv5uoMoehvxgPerpQiXyvMPeiUUdtf6}}} * 384 {{{tahoe:3a31SqUbf8fpWE1opRCT3coDhRqTU7bDU2AvC3RQJBu6ZNFhVscyxA9slYtPVT79x}}} Adding 2 metadata characters and a clear separator gives us: * 96: {{{tahoe:MW-14efs6T5YNim0vDVV}}} * 128: {{{tahoe:DW-4V2uIYVX0PcHu9fQrJ3GSH}}} * 192: {{{tahoe:MR-072Og6e75IOP9ZZsbR1Twjs6X5xXJnBAF}}} * 256: {{{tahoe:DR-fZeioazoWrO62reiAjzUAyV0uz3ssh6Hnanv8cKMClY}}} * 288: {{{tahoe:MR-11DriaxD9nipA10ueBvv5uoMoehvxgPerpQiXyvMPeiUUdtf6}}} * 384: {{{tahoe:MR-3a31SqUbf8fpWE1opRCT3coDhRqTU7bDU2AvC3RQJBu6ZNFhVscyxA9slYtPVT79x}}} [http://allmydata.org/trac/tahoe/ticket/217#comment:44 #217:c44] says that, if we don't need to prevent collisions, then we can use a K-bit hash for K-bit second-pre-image resistance. = Design Proposals = == Commonalities == * once we get the ciphertext, it gets segmented and erasure-coded in the same way as immutable files. Shares include a merkle tree over the share blocks, and a second one over the ciphertext segments. * we'd like to add a merkle tree over the plaintext, without reintroducing the partial-information-guessing attack that prompted us to remove it. This means encrypting the nodes of this merkle tree with a key derived from the readcap. * We'll continue to use the signing layout of the current mutable files: there will be a UEB that includes the root of the hash trees (and everything else in the share), which will be hashed to compute the "roothash" (which changes with each publish). A block of data that includes the roothash and a sequence number (as well as any data-encrypting salt) will be signed. * It might be good to make the layout a bit more extensible, like the way that immutable files have a dictionary-like structure for the UEB. * In general, the share will always contain a full copy of the pubkey, for the benefit of server-side validation. I don't think it matters whether the pubkey is stored inside or outside of the signed block, but it will probably make the upload-time share-verification code simpler to put it inside. * In general, the storage-index will be equal to the pubkey. If the pubkey is too long for this, the storage-index will be a sufficiently-long secure hash of the pubkey. The SI must be long enough to meet our collision-resistance criteria. == ECDSA, semi-private keys, no traversalcap == Zooko captured the current leading semi-private-key-using mutable file design nicely in the [http://allmydata.org/~zooko/lafs.pdf "StorageSS08" paper] (in Figure 3). The design is: * (1K) writecap = K-bit random string (perhaps derived from user-supplied material) (remember, K=kappa, probably 128bits) * (minimum 2K) readcap = minimum 2*K-bit semiprivate key * verifycap = 2*K-bit public key * storage-index = truncated verifycap On each publish, a random salt is generated and stored in the share. The data is encrypted with H(salt, readcap) and the ciphertext stored in the share. We store the usual merkle trees. This provides offline attenuation and full server-side validation. It removes the need to pull a copy of the pubkey or encprivkey from just one of the servers (the salt must still be fetched, but it's small and lives in the signed block that must be fetched anyways). === add traversalcap === Like above, but create two levels of semiprivate keys instead of just one: * (1K) writecap = K-bit random string * (minimum 2K) readcap = minimum 2*K-bit first semiprivate key * (minimum 2K) traversalcap = minimum 2*K-bit second semiprivate key * verifycap = 2*K-bit public key * storage-index = truncated verifycap The dirnode encoding would use H(writecap) to protect the child writecaps, H(readcap) to protect the child readcaps, and H(traversapcap) to protect the child verifycap/traversalcaps. == Any public key algorithm, no semi-private keys, no traversalcap == Without semi-private keys, we need something more complicated to protect the readkey: the only thing that can be mathematically derived from the writecap is the pubkey, and that can't be used to protect the data because it's public (and used by the server to validate shares). One approach is to use the current (discrete-log DSA) mutable file structure, and merely move the private key out of the share and into the writecap: * (1K) writecap = K-bit random string = privkey * (3K) readcap = H(writecap)[:K] + H(pubkey) * verifycap = H(pubkey) * storage-index = truncated verifycap In this case, the readcap/verifycap holder is obligated to fetch the pubkey from one of the shares, since they cannot derive it themselves. This preserves offline attenuation and server-side validation. The readcap grows to (1+2)*K : we can truncate the AES key since we only need K bits for K-bit confidentiality, but require 2*K bits on H(pubkey) to attain K-bit collision resistance. The verifycap is 2*K. === include ECDSA pubkey in cap === Or, if the pubkey is short enough, include it in the cap rather than requiring the client to fetch a copy: * (1K) writecap = K-bit random string = privkey * (3K) readcap = H(writecap)[:K] + pubkey * verifycap = pubkey * storage-index = H(pubkey) I think ECDSA pubkeys are 2*K long, so this would not change the length of the readcaps. It would just simplify/speed-up the download process. If we could use shorter pubkeys, this design might give us slightly shorter keys. Alternately, if we could use shorter hashes, then the H(pubkey) design might give us slightly shorter keys. === Any public key algorithm, no semi-private keys, with traversalcap === Since a secure pubkey identifier (either H(pubkey) or the original privkey) is present in all caps, it's easy to insert arbitrary intermediate levels. It doesn't even change the way the existing caps are used: * (1K) writecap = K-bit random string = privkey * (minimum 3K) readcap = H(writecap)[:K] + H(pubkey) * (3K) traversalcap: H(readcap)[:K] + H(pubkey) * verifycap = H(pubkey) * storage-index = truncated verifycap == Shorter readcaps (insecure) == This section used to describe a scheme using a MAC (with the read cap as key) instead of signatures. However this was vulnerable to manipulation by a collusion between Rose-the-readcap-holder and the storage servers, and could be used to cause another readcap-holder to see the wrong data. See the [http://allmydata.org/trac/tahoe/wiki/NewMutableEncodingDesign?version=7#Shorterreadcaps history] if you're interested.