[tahoe-dev] Removing the dependency of immutable read caps on UEB computation
Shawn Willden
shawn at willden.org
Fri Oct 2 10:38:22 PDT 2009
I'd like to have a little discussion on whether or not it makes sense in the
new immutable cap design to remove the dependency on UEB computation.
As background for any who aren't familiar with it, and to confirm my own
understanding, the UEB, or URI Extension Block, is a block of hashes that
provides strong, multi-way integrity verification of the immutable file.
Specifically, it contains:
1. The root of a Merkle tree on the file plaintext
2. A flat hash of the file plaintext
3. The root of a Merkle tree on the file ciphertext
4. A flat hash of the file ciphertext
5. Roots of Merkle trees on each share of the FEC-encoded ciphertext
That's a lot of hashes, and it provides strong integrity guarantees. It
provides a way to verify the integrity of the plaintext, the ciphertext and
each encoded share of the ciphertext. That's all very good.
A copy of the UEB is stored with each share.
The current immutable read cap design embeds a hash of the UEB in the URI.
Indeed, this 32-byte hash is comprises most of the length of current
immutable read caps. David-Sarah Hopwood's Elk Point design applies Zooko's
ideas about how to combine security and integrity parameters to make the UEB
hash 'implicit' in the read and verify caps, but it's still present.
The disadvantage of including the UEB hash in the read and verify caps,
whether explicitly or implicitly, is that it means that FEC coding must be
completed before the caps can be generated. This is unfortunate, because
without it, it would be possible to efficiently compute read caps separate
from the upload process, and even long before the upload is performed. I can
think of many applications for that.
The larger issue, though, is that the present design binds a given read cap to
a specific choice of encoding parameters. This makes it impossible to change
those parameters later, to accommodate for changing reliability requirements
or changing grid size/structure, without finding a way to update all extant
copies of the original cap, wherever they may be held.
To address these issues, I propose splitting the UEB into two parts, one part
that contains the plaintext and ciphertext hashes, and another that contains
the share tree roots and the encoding parameters. Call them UEB1 and UEB2.
UEB1 and any values derived from it can then be computed without doing FEC
computations, and without choosing specific encoding parameters.
Based on UEB1, a client with the verify cap can verify the assembled
ciphertext and a client with the read cap can verify the decrypted plaintext.
What they can't do is to verify the integrity of a specific share.
Putting the UEB2 in the shares is the proximate solution to share validation,
but raises the issue of how to validate the UEB2. Since it would be
undesirable to allow anyone with read access to the file the ability to fake
valid UEB2s, this requires introduction of an additional cap, a "share
update" cap, which is not derivable from the read or verify caps. I suppose
you could also call it a "repair cap".
One way to do this, using the nomenclature from David-Sarah's Elk Point
immutable diagram, is to add a W key, from which K1 is derived by hashing.
In addition, an ECDSA key pair is derived from W. The UEB2 is signed with
the ECDSA private key, and the signature is the UEB2 verifier, stored with
each share. The "share update" cap would consist of the SID and the private
key. W could also be used as a 'master' cap from which all others can be
derived.
Another possibility is to use the Elk Point mutable structure and fix the
content by including the UEB1 data in the information hashed to produce T|U
and signed to produce Sig_KR. To retain the idempotent-put characteristic of
Tahoe immutable files, W can be a content hash, rather than a random value,
and KD must be derived from W or omitted from the series of hashes that
produces S. It may be valuable for both security analysis and code complexity
to make mutable and immutable files be very similar in structure.
The obvious downside of both of those approaches is that they introduce a need
for asymmetric signatures, where immutable files previously required only
hashing and symmetric encryption. I don't think there's any way to maintain
share integrity while removing the dependency of the caps on FEC parameters.
Personally, I think being able to re-structure the encoding without updating
all of the caps is sufficient justification to accept the use of asymmetric
signatures in immutable file buckets, and being able to generate caps without
performing FEC computations is a very nice bonus.
Comments?
Shawn.
More information about the tahoe-dev
mailing list