[tahoe-dev] Tahoe-LAFS is widely misunderstood (was: odd comment about #shares and confidentiality)

Mon Jan 31 07:35:42 UTC 2011

On Sat, Jan 29, 2011 at 8:21 AM, Greg Troxel <gdt at ir.bbn.com> wrote:
>
> "To avoid decryption of stored data by node operators, each operator can setup max. 2 storage nodes."
>
> I suspect they are unclear on how tahoe works, and perhaps Someone wants
> to address that.

Yes, it sounds like they think that the confidentiality properties of
Tahoe-LAFS are tied to the K-out-of-N erasure coding. There is a
widely appreciated cryptographic algorithm called Shamir Secret
Sharing which has that effect, and the proprietary "Cleversafe"
storage product uses an All-Or-Nothing Transformation to get that
effect, but of course for Tahoe-LAFS we don't *want* that effect! If
you upload a file, spreading the ciphertext shares across many
different storage servers, then someone can read your file only if you
give them the capability to that file, regardless of the whether any
number of the storage server operators want them to read the file.
Even if *all* of the storage server operators put together want that
person to read the file, they *still* can't read the file unless *you*
give them the capability.

This is the latest example in a string of events which has convinced
me that we have a serious problem with our documentation and
communication (or "marketing"). We are failing to make the basic
properties clear to a large class of readers. Below I list some other
misunderstandings I've noticed.

This is bothers me, because I've long thought (probably inspired by
something that Brian once said) that even if Tahoe-LAFS fails as a
working implementation then it will still have been worth it if people
learn from it about what is possible. To maximize this value we have
to communicate clearly about what properties Tahoe-LAFS offers and
how.

(Gladly, Tahoe-LAFS also seems to be succeeding as a working
implementation. We now have quite a few long-term users who are
more-or-less satisfied with the results they are getting.)

A list of some of the misunderstandings I've seen recently (not a
complete list -- just the ones that come to mind):

1. The anonymous-proxy-servers.net page that Greg Troxel noticed:
http://anonymous-proxy-servers.net/wiki/index.php/Tahoe-lafs-setup .
They seem to think the confidentiality properties are contingent on
the decisions of K out of N storage servers.

2. This post to the p2p-hackers mailing list a couple of days ago in
which the author, Michael Militzer, says he wants a distributed
storage system which is resistant to adversarial behavior, and then
seems to say that Tahoe-LAFS is designed to trust the nodes. I don't
understand why he says that. Perhaps he is thinking of some sorts of
advanced attacks and defenses that we don't address. Or maybe he just
completely misunderstood the Tahoe-LAFS design principles. :-)
http://lists.zooko.com/pipermail/p2p-hackers/2011-January/002805.html

3. The anonymous Chinese people, twitter nick "fuckgfw", who were
using the public test grid gateway as though it offered some
(unspecified) security properties. I mentioned them a couple of times
on this list, including this email:
http://tahoe-lafs.org/pipermail/tahoe-dev/2010-November/005580.html .
We've changed the DNS name of that gateway to
"insecure.tahoe-lafs.org", Jonathan Moore provided a diagram of Yu
Xue's Chinese translation of network-and-reliance-topology.png, and
David Triendl is going to add that diagram and some warning text to
the public test grid gateway.

4. The misunderstanding that Tahoe-LAFS puts secrets into filenames
which (if I understand correctly) James Donald was under when he
posted http://tahoe-lafs.org/pipermail/tahoe-dev/2011-January/005966.html
(see also my recent reply
http://tahoe-lafs.org/pipermail/tahoe-dev/2011-January/006009.html ).
I've heard the same misunderstanding (I think) from another smart
security expert when I was presenting Tahoe-LAFS in Poland last year.
The fact that such well-read people have such misunderstandings
clearly indicates to me that the problem is with Tahoe-LAFS's
complexity and its insufficient documentation rather than with the
readers.

5. This person on IRC tonight who thought Tahoe-LAFS couldn't work
with too few storage servers:
http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1082#comment:8 .

My best attempt so far to explain the essentials of Tahoe-LAFS is
http://tahoe-lafs.org/~zooko/lafs.pdf . I think we need a concerted
effort to write more comprehensive, clear, and well-organized docs and
to guide people to those docs in our "marketing" (which is primarily
in the form of release announcements and the front page of the wiki, I
guess).

Help needed! :-)

Regards,

Zooko