[tahoe-dev] question about sharing...
toby cabot
toby at caboteria.org
Mon Jun 20 15:39:13 PDT 2011
Zooko,
Thanks for your patient explanation.
On Wed, Jun 01, 2011 at 04:53:35PM -0600, Zooko O'Whielacronx wrote:
> On Wed, Jun 1, 2011 at 11:42 AM, toby cabot <toby at caboteria.org> wrote:
> > I have a question about sharing files with other people and I can't
> > find the answer on the site but I hope this isn't a FAQ.
>
> We should write a FAQ about this! But the answer might be long. Might
> need to be its own wiki page. Any volunteers?
I've taken a stab at this below. It *did* end up being longer than
I'd hoped, but some ruthless editing can probably cut it down to size.
Comments/criticism welcome.
=====================================================================
[[http://tahoe-lafs.org/trac/tahoe-lafs][Tahoe-LAFS]] is described as
"the first decentralized storage system with provider-independent
security". Its name indicates that it's a "file system" but it's
different than traditional file systems in ways that are important to
understand before you start using it. This page will try to explain
at a high level, in plain English, how Tahoe-LAFS works and provide
links that will allow you to learn about it in detail.
Before we go any further, please read the
[[http://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/docs/about.rst][one-page
summary]], then come back here. As you saw on that page, Tahoe-LAFS
provides a guarantee that you can store your data on servers that you
don't trust, and the administrators of those servers won't be able to
read your data. It does this by encrypting the data before it stores
it on those servers, so that all they see is random-looking bits and
they can't recover the actual content of your files. Tahoe-LAFS also
guards against the failure of the storage servers by storing the same
data on more than one of them. Of course, this will use more disk
storage than simply storing the file once, but you can decide how
you'd like to trade off extra storage for fault-tolerance.
Capabilities (vs. Access Control)
Before we get into how Tahoe-LAFS stores files, it will be useful to
recap how a traditional file system works. Traditional filesystems
start at a well-known "root" and allow users to explore the filesystem
from there. Because the root is well-known, you can go to it and list
the files in it; you can also go "up" from any directory to its
parent. Because users can explore file systems in this way, each user
would be able to do anything they wanted unless there were some sort
of inline permission check, so these filesystems implement "Access
Control List" (ACL) permission checks. These checks specify which
users are allowed to access each file and prevent users from doing
things they can figure out how to do, but are not permitted to do. In
other words, I can discover a directory's existence, and learn its
name, but I might not be allowed to read from it. In order to
implement these permission checks, though, the file system has to know
who you are, so you need to log in. In order to prove that you are
who you claim to be, you have to provide a password and/or other
credentials. Then you need to specify who has what kind of access to
each file and directory. This approach works well enough, but it is
complex and because of that it's very difficult to ensure that it's
secure.
Tahoe-LAFS does away with the complexity inherent in the ACL approach
and uses a much simpler approach, called "capabilities". Access to
each file (and directory) in Tahoe-LAFS is allowed by a "capability"
which is a string of characters that looks something like
=URI:CHK:riplmjitnwh25ur3jomzyxrww4:et4gkxykswl7lstw5q4g5suf6y2xyyphvid5nn2r3ktvhytbs5da:3:10:3472=.
A file can have different capabilities, for example, one capability
might allow you to read the file but a different capability might
allow you to read and write the file. Each capability contains the
two things that you need to access the file: how to find the encrypted
bits (the "storage index"), and how to decrypt them (the "encryption
key").
Access to any given file is a simple yes/no proposition: if you know
that file's capability then you'll be able to read it, if you don't
then you won't be able to. It doesn't matter who you are, or what
group you're in, or if you're a "superuser" or not. In fact,
Tahoe-LAFS doesn't have any sense of "identity" at all: you don't have
to sign in or provide credentials to prove who you are, because
Tahoe-LAFS doesn't know or care.
It's important to understand that a capability specifies the location
of a file, but it's different than a traditional file system "path".
Tahoe-LAFS has no well-known "root" so there's no way to poke around
and try to discover things inside it. Each directory and file can be
found only by its capability and can't be discovered in any other way.
(How many bits in a capability, i.e. how hard would it be to guess?)
A directory capability acts like a traditional file system directory
in that users can browse down from it to see files in it and in the
tree below it, but they can't browse "up" to see other directories
within the same Tahoe-LAFS file system. It's as if each directory in
Tahoe-LAFS is a root directory. Users cannot discover things that
they're not supposed to know, so the in-line ACL checks implemented by
traditional file systems are unnecessary.
If you're curious about the capability model, it's worth taking some
time to learn more about it:
http://en.wikipedia.org/wiki/Capability-based_security
Sharing
As you can imagine, the Tahoe-LAFS capability model makes file sharing
easy: just give the other person that file's capability string. Once
you've done that, the other person will be able to do everything that
the capability enables. If you share a read-only capability then the
person you shared it with will be able to read the file but not write
it. If you share a read-write capability then they will be able to
both.
The simplicity of Tahoe-LAFS capability model makes very fine-grained
sharing control easy. In a traditional Unix filesystem, for example,
you can control access to a file based on the person that owns the
file, the group that owns the file, and "other" people. Since
creating a group is something that only superusers can do, it's easy
to imagine situations where a regular user would like to share a file
with a small community of people but can't do it because she isn't
allowed to create groups. A typical solution to this problem is to
add another layer of complexity on top of the file system layer to
provide this level of control. Tahoe's capability model provides this
functionality easily. If I want to give one person read-write
capability and give another four people read-only capability that's
easy to do - just send the appropriate capability string to the
appropriate people.
Revoking
Sharing is easy but "revoking" is much harder to do, for both
traditional file systems and Tahoe-LAFS. The fundamental problem,
which is the same in both cases, is that once you give someone else
the ability to read some data, you can't prevent that person from
re-distributing that data in ways that you might not intend. At first
glance, it would appear that a traditional filesystem offers stronger
protection for this case, but in fact the ACL approach and capability
approach are similar in the ways in which data can "leak": both
techniques prevent "outsiders" from seeing data that they're not
supposed to, but both can be subverted by people who are given the
ability to read some data, and then choose to export it from the
system and distribute it outside the scope of the system.
In both cases, then, it's important to be very sure that you trust the
people that you're sharing data with, because once you share the data
there's no going back.
Links
More info on Access Control Lists:
http://en.wikipedia.org/wiki/Access_control_list<br/>
A relevant mailing list discussion:
http://tahoe-lafs.org/pipermail/tahoe-dev/2011-June/006388.html
More information about the tahoe-dev
mailing list