[tahoe-dev] Public web interface

Nathan nejucomo at gmail.com
Tue Dec 4 00:10:43 UTC 2012


On Mon, Dec 3, 2012 at 3:35 PM, Uncle Zzzen <unclezzzen at gmail.com> wrote:
> Hi.
> I'm busy with work lately, but there's a discussion Zooko and I were having
> on a closed ticket, and I agree with him it actually belongs here, so here
> goes:
>
> Sometimes there's a need to expose a [partial view of a] Tahoe-LAFS storage
> as a public web service. As far as I understand, there are 3 ways to do it.
>
> 1) Gateway to web api - the public server proxies requests to Tahoe's web
> api, blocking undesired requests (e.g. POST ones). This is what lafs-rpg
> does (using nginx). You can also tweak it to in various ways (e.g. disable
> directory browsing under some subtree).
>
> 2) Static web server, file-system back-end - use a standard static html web
> server (apache, nginx, etc.) and let it serve files from a fuse-mounted
> Tahoe-LAFS cap. In the future, once we have "dropbox-like functionality", it
> would enable us to serve static files from a "magically synced" file-system
> folder, and we won't even need the fuse trickery.
>
> 3) Dedicated service - Tahoe-LAFS can have, in addition to the web api, a
> public web service (listening on a different port). We would need to define
> the various mountpoints this server has (e.g. map /blog/ to
> /uri/DIR-RO:.../Latest/), and additional configuration options (basic/other
> auth, mustache/jinja2/etc. template for directory browsing if allowed,
> etc.). We can either do all that explicitly at tahoe.cfg, or simply specifiy
> a capability where this config (probably json) is read from (handy if you
> want to remotely configure such a server, but might be vulnerable for
> exactly the same reason).
>
> Option 1 is what I use at the moment. It may not be a pretty sight, but it
> ain't broke (AFAIK) so I don't have an urge to fix it.
> Zooko prefers option 3. I agree this could be neat.
> What's your opinion?
>

I see two different aspects of a public gateway mentioned here.  One
aspect is the architecture of the components, and the other is policy
around mapping public web urls to tahoe capabilities.

The architectures mentioned are:

1. "http proxy": web browser -> http server-side proxy (like nginx) ->
tahoe gateway -> tahoe grid

2. "fuse proxy": web browser -> http static filesystem server ->
filesystem -> fuse process -> tahoe gateway -> tahoe grid

3. "static file server": web browser -> http static filesystem server
-> filesystem <- external sync process (dropbox-like) -> tahoe gateway
-> tahoe grid

Notice that "external syn process" points left towards the filesystem
instead of right.  The requests/responses are decoupled and the http
server and dropbox-like process asynchronously read or update the
filesystem.

4. "built-in web server": web browser -> as-of-yet-unimplemented tahoe
"public mode" gateway -> tahoe grid.


Just by counting arrows, it's obvious that 4, the built-in web server
would be the "leanest" approach in terms of fewest "hops", so this
might be most efficient.  The trade-off of fewer "hops" is less
separation of concerns.

For example, in approach 1, an nginx proxy might terminate SSL, and it
*may* be that because nginx is very popular, if there were
security-related bugs in the SSL server side in nginx, they'd be found
quickly and fixed, whereas if a future version of tahoe has a built-in
web-server that also terminates SSL, then it may have a smaller user
base and security bugs may be less likely to be noticed.

Taking the approach of 4, a tahoe-specific public web server could use
more "local information" to possibly make better, more accurate
decisions.  For example, it might be able to make smarter caching
decisions.

IMO, architecture 2, a fuse proxy, is less attractive than 1, the http
proxy.  One reason is that in 1, the first two hops are both http
requests, and http is already proxy-friendly.  OTOH, in 2, an HTTP
proxy is translated into a set of filesystem requests, which are then
translated back into requests to a tahoe gateway, so there's some
impedence mismatch.  For example, the tahoe gateway might have useful
caching information expressed in a standard http manner, which nginx
could handle, but which would be lost (probably) by a fuse layer.
Also, the fuse interfaces I'm aware of speak to the gateway over http
anyway, so there are more hops.

Architecture 3 is appealing because the left hand side of the || is
simple and well understood: It's just a static web server.  The only
difference is in how the content may be updated.


So those are the architectural considerations.  The policy
considerations seem separable to me.  In any architecture the site
operator may choose to carry capabilities all the way through, hide
capabilities behind well known URL paths, or handle directory requests
differently than file requests.

I personally am interested in the idea that a public web interface
operator is not directly aware of the content or the publishers of
data being served.  This scenario is similar to tor2web.  To support
that case, the simplest approach seems to just pass capabilities
through from the tahoe grid to the public web, and to rely on
publishers to share their capabilities out of band (similar to
tor2web).

Of course, another use case that seems popular is to have a centrally
controlled "site" that looks very much like any other website from the
outside, except whose storage is backed by a tahoe grid.

Even when the capabilities are all hidden at the public proxy layer,
this architecture is importantly different from a security standpoint
because of provider independent security.  If we contrast that
architecture to a traditional architecture with a web server connected
to a database layer, it's interesting that the "database" equivalent
need not be trusted by the web server beyond availability.  If
malcontents break into a storage grid machine (or even all of them),
they can wreak much less havoc than if they break into a traditional
website database.  Likewise, if they break into the public-facing web
proxy, then can intercept and modify contents on the way out, but
anyone with access to the grid can still see the legitimate content
and updates.



> Cheers,
> The Dod
>
> _______________________________________________
> tahoe-dev mailing list
> tahoe-dev at tahoe-lafs.org
> https://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev
>


Regards,
Nathan


More information about the tahoe-dev mailing list