[tahoe-dev] proposal for an HTTP-based storage protocol

Sun Sep 26 16:03:23 UTC 2010

Thanks for the suggestions! Responses inline.

On Sun, Sep 26, 2010 at 6:51 AM, Kevin Reid <kpreid at switchb.org> wrote:
> On Sep 26, 2010, at 1:35, Ravi Pinjala wrote:
>
>> There have been some noises on this list about replacing the
>> foolscap-based storage protocol with something HTTP-based and easier
>> to work with. I'd like to throw in my own work on an extensible
>> HTTP-based storage protocol as a starting point....
>
> [...]
>
> I'd like to offer some criticism of this protocol from a
> web-architecture/REST perspective.
>
>> * discovery document URL: http://server.address/
>
> Let this be an arbitrary URL, not required to be the server root.
>

It actually can be an arbitrary URL, I only used the root for brevity
in the example.

>> * discovery document contents:
>> <webfs>
>>        <module path="data" interface="http://p-static.net/webfs/data/1.0">
>>                <feature name="max-directory-depth" value="0" />
>>                <feature name="max-data-size" value="1048576" />
>>        </module>
>>        <module path="metadata"
>> interface="http://p-static.net/webfs/metadata/1.0" />
>> </webfs>
>
> Place these elements in an XML namespace.
>
> Perhaps even let the XML namespace serve for interface and feature
> identification:
>
> <webfs xmlns="http://p-static.net/webfs/1.0">
>        <module xmlns="http://p-static.net/webfs/data/1.0"
>                path="data">
>                <max-directory-depth>0</max-directory-depth>
>                <max-data-size>1048576</max-data-size>
>        </module>
>        <module xmlns="http://p-static.net/webfs/metadata/1.0"
>                path="metadata" />
> </webfs>
>

I was planning on using a namespace for this, but was sort of
procrastinating on that until I got a domain or something like that.
Didn't want to put my personal domain in there. XD (The interface
names are different; those aren't intended to be permanent features of
the protocol, since they can be replaced by other interfaces if
somebody comes up with better ones.)

The trouble with using the xmlns to identify the interface is that it
complicates parsing a bit - clients have to support separate formats
for each interface. I also had some vague ideas about also exposing
the discovery document as JSON, since not everybody likes XML. Neither
of those objections are very strong, though, and using namespaces for
the configuration elements would give more flexibility, so you're
probably right.

>> * "data" interface URL: http://server.address/data/
>
> This URL should be constructed from resolving path="data" as a relative URL
> against the discovery document URL. Then, use xlink:href= instead of path=
> as the attribute.
>
> The goal of all these changes is to make the XML contain more semantics that
> are already understood by general XML/web tools, reducing the amount of
> application-specific interpretation logic needed (thus reducing the chances
> that someone will casually implement the interpretation incorrectly).
>

Good point, I'll make this change.

>> * URL of a document stored on the server:
>> http://server.address/data/foo/bar
>>
>> * URL of the metadata for said document:
>> http://server.address/metadata/foo/bar
>>
>> * Example of direct access to a metadata key:
>> http://server.address/metadata/foo/bar?mtime
>
> It should be explicitly part of the definition of the data and metadata
> modules that they define these path patterns (underneath the path= URL).
>

Mmm, what do you mean? I'm not really seeing what you're saying here.

>> The modules I've implemented so far are a RESTful data module
>> (GET/PUT/DELETE on a path do exactly what you'd expect) and metadata
>> module (lets you associate arbitrary key-value metadata with a file,
>> also uses GET/PUT/DELETE in an intuitive way). If my understanding of
>> how a storage node works is correct, this is enough to implement a
>> storage node.
>
> What it doesn't have that a storage node should have is verifying of what's
> uploaded; it should check that the name of an uploaded share is the
> appropriate function of its contents (I don't know offhand what that
> actually is), so that clients can't upload obviously bogus shares.
>
> IIRC, this is one of the reasons we haven't just implemented 'WebDAV server
> as storage node', even though WebDAV does have the GET/PUT/DELETE and
> arbitrary-metadata functionality.
>
> (Ah, that raises another question: What are the advantages of your protocol
> over WebDAV? I've implemented WebDAV, and while it does have a certain
> amount of architecture bloat, it doesn't seem -- at the moment -- worth
> using a different protocol just for that.)
>

Do we actually need server-side verification of data? We already let
clients upload whatever they want to servers, as long as it's properly
formatted as a share. I'm hoping that using metadata instead of a
special share format will let the server ignore the contents of the
share safely. (If we just need client-side verification that the share
was uploaded properly, I actually implemented a module the other day
in about a half hour that computes checksums on the server. Yay for
trivial extensibility! :)

As far as "why not webdav", it's a fair question. I've never been a
fan of the way webdav creates new HTTP methods for all kinds of
things, especially since (AFAIK?) the OPTIONS method isn't
well-specified enough to detect which functionality is available. (Now
that I think about it - that can't possibly be right. I'll look up how
OPTIONS is defined in webdav after I send this email.) I think the way
webfs handles extensibility is a lot cleaner. There is some bloat in
webdav, like you said, but more than that I don't know how easy it is
to pick a subset of webdav and only implement that. Having a
subsettable API was one of the main design goals of webfs.

That said, there's no reason that webdav couldn't be implemented as a
webfs module. :) The real value of webfs is being able to extend the
storage protocol later on, in a way that's fully compatible, and
guaranteed not to conflict with what anybody else is doing.

> --
> Kevin Reid                                  <http://switchb.org/kpreid/>
>
>
>
> _______________________________________________
> tahoe-dev mailing list
> tahoe-dev at tahoe-lafs.org
> http://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev
>

--Ravi