[tahoe-dev] Public web interface

Tue Dec 4 02:17:11 UTC 2012

Nathan <nejucomo at gmail.com> writes:

> Just by counting arrows, it's obvious that 4, the built-in web server
> would be the "leanest" approach in terms of fewest "hops", so this
> might be most efficient.

I think this is the key point, but I don't think hop count is the right
metric of lean.  The real issue is total system semantic  and code
complexity, and if tahoe is ever to be taken really seriously it's going
to need to meet the VFS interface.   Once it does that, the web server
is a standard component that doesn't need to know about tahoe.  I think
I'm basically rephrasing:

> The trade-off of fewer "hops" is less separation of concerns.

but I think that's the more important consideration.

> Taking the approach of 4, a tahoe-specific public web server could use
> more "local information" to possibly make better, more accurate
> decisions.  For example, it might be able to make smarter caching
> decisions.

Caching should either be part of the fuse interface or a fuse-fuse
module.  Ideally the write-behind caching of coda could be a module to
separate storage from caching.  (What I'd really like to see is a way to
have a fuse mount that has the combined properties of coda and tahoe.)

> Also, the fuse interfaces I'm aware of speak to the gateway over http
> anyway, so there are more hops.

I think the fact that the fuse interface uses a web api is an artifact
of the current implementation and not intrinsic to the protocol.
There's no reason there couldn't be an in-kernel tahoe client similar to
how nfs is often done.  We just don't have that yet.

Arguably it's a bug that there is one implementation of the filesystem,
leading to a lack of separation of the software and the protocol spec.

> I personally am interested in the idea that a public web interface
> operator is not directly aware of the content or the publishers of
> data being served.  This scenario is similar to tor2web.  To support
> that case, the simplest approach seems to just pass capabilities
> through from the tahoe grid to the public web, and to rely on
> publishers to share their capabilities out of band (similar to
> tor2web).

I don't really follow how your point applies, because in all of these
the translating system will handle the caps.   If you're building
something like freenet, I think you have to not have common gateways
handle caps.

> Even when the capabilities are all hidden at the public proxy layer,
> this architecture is importantly different from a security standpoint
> because of provider independent security.  If we contrast that
> architecture to a traditional architecture with a web server connected
> to a database layer, it's interesting that the "database" equivalent
> need not be trusted by the web server beyond availability.  If
> malcontents break into a storage grid machine (or even all of them),
> they can wreak much less havoc than if they break into a traditional
> website database.  Likewise, if they break into the public-facing web
> proxy, then can intercept and modify contents on the way out, but
> anyone with access to the grid can still see the legitimate content
> and updates.

You may be interested in a paper about encrypting databases.  Here, the
challenge is to encrypt data while still allowing queries.  For data to
be fetched and not joined, it's easier.  Here's a tech report about the
work; I haven't read it but attended a talk by the lead author a year or
so ago.

  http://people.csail.mit.edu/nickolai/papers/raluca-cryptdb-tr.pdf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20121203/14978b2c/attachment-0001.pgp>