[tahoe-dev] Interesting hashing result
zooko
zooko at zooko.com
Wed Mar 4 14:41:02 PST 2009
[Folks: I'm replying to old mailing list posts that I didn't have
time to reply to when they were new because I was preparing the
tahoe-1.3.0 release. Beware of time travel culture shock.]
On Feb 16, 2009, at 1:51 AM, Shawn Willden wrote:
> I don't think this is a problem. Or at least, it's not a problem
> that doesn't exist even without the weak hash. If the attacker
> knows the storage ID of your file, he can replace it in the grid --
> he doesn't need to be able to generate another file that hashes to
> the same value.
Currently we address this problem by having storage servers never
overwrite immutable files with different contents. Only the first
client to begin uploading an immutable file gets to choose its
storage index, then if another client tries to use the same storage
index while the upload is in progress the server tells it that the
file is already in progress (or maybe it says "the file is already
there", which wouldn't be quite right...), and then once the uploader
closes the upload the mapping between that storage index and that
share, in the mind of that storage server, is set in stone.
Now, we're about to introduce garbage collection in Tahoe-1.4 or so,
and then that raises the question of what if the share got garbage
collected and then someone uploads a different flie with the same
storage index, and then someone who didn't know about either of those
events tries to re-upload the original one.
In the long run I think a better solution is to make the storage
index be equal to the verifier cap. This requires a different
semantics for uploads-in-progress because the verifier cap isn't
known to the uploader when it starts the upload, only when it
finishes the upload, so it will have to tell the storage server that
it is about to start uploading something and bind the ongoing upload
to the current connection or else to a temporary "upload in progress"
token instead of to the ultimate storage index. Then, once the
upload is finished the storage server moves it from the temp
"incoming" directory to the final location indexed by its storage
index which is its verify cap. The storage server can also therefore
*check* that the share matches the verify cap (because anyone can
check that a share fits a given verify cap), which makes all of those
aforementioned issues simpler and more obviously right.
As an added benefit, this might facilitate better restart of
interrupted uploads and such.
I think Brian might know some other problems or complications of that
proposal, so hopefully he'll follow-up to this post.
> Another use case that I plan to try in the near future is to attach
> a big USB drive to a Linksys router running custom firmware, and
> use that as a Tahoe node.
:-)
David Reid and Zandr Milewski are both interested in experimenting
with Tahoe on those sorts of embedded NAS/router/whatsit boxes.
Exciting!
>> In the year 2012 (hey, we're living in the future!), the new SHA-3
>> hash function will be chosen. That function will also, I hope,
>> require about 1/3 as many CPU cycles as SHA-256 does while being a
>> safer long-term bet.
>
> If the result parallels the success of the AES selection process,
> it may be even faster than that.
I wish! The very fastest not-yet-broken candidates right now take
about 1/3 as many CPU cycles as SHA-256 (according to [1]), and the
thrust of NIST's management of the contest seems to be to get a hash
function which isn't slower than SHA-256, but which is safer.
So, even after SHA-3 is final, we'll need either as many CPU cycles
as SHA-2 or perhaps 1/2 or 1/3 or 1/4 as many. By comparison MD5
takes about 1/4 as many cycles as SHA-256. (And by the way if
matters a lot what CPU architecture you're using and how long are the
messages you want to hash.)
Regards,
Zooko
[1] http://bench.cr.yp.to/results-hash.html
More information about the tahoe-dev
mailing list