#1565 closed task

URL formats for HTTP-based storage server — at Initial Version

Reported by: warner Owned by:
Priority: major Milestone: eventually
Component: code-storage Version: 1.9.0b1
Keywords: newurls accounting Cc: nejucomo
Launchpad Bug:

Description

Ticket #510 is about speaking to storage servers with mostly-plain HTTP. One piece of this is deciding what the URLs should look like. Downloading a share from the storage server should be a simple HTTP "GET", using a Range: header to fetch less than the whole share. But we also need ways to discover which shares are available for download, and eventually ways to upload data to the server too.

Here's the starting point that I implemented in my prototype (which still uses Foolscap and get_buckets() to discover shares):

  • GET /storage/imm/SI/%(storage_index)s/share/%(shnum)d: retrieves data from the given share. Normal downloads use e.g. {{{Range: bytes=87418-131108,422601-422664,423593-423656}}} to fetch a bunch of spans.
  • GET /storage: this currently returns a human-readable page describing the state of the storage server.

The next steps:

  • GET /storage/imm/SI/%(storage_index)s/shares: return a JSON list of share numbers
  • GET /storage/imm/SI/%(storage_index)s/all_shares: return a JSON dictionary mapping share number to a read data vector. The same spans are returned for all shares. This collapses the Do-You-Have-Block query with the initial data fetch, allowing one-round-trip downloads.

I put "imm" into the URL because the current storage server treats immutable and mutable shares very differently (they have different container formats). It's not trivial to take an SI and switch on the type of share that it points to. It might be cleaner to fix the server to handle this well, and then remove the "imm" from the URL. OTOH, it might be better to leave them distinct.

We need similar URLs for reading from mutable shares; they can probably be the same but with "mut" instead of "imm".

We'll need POST URLs for uploading files and modifying mutable shares, as well as adding/renewing leases and other storage server methods. The request bodies will be more complicated since they'll need authorization signatures or something. But the basic URL target could be:

  • POST /storage/imm/SI/%(storage_index)s/shares/%(shnum)d: start uploading the given share. Return 302 FOUND if the share already exists. The upload can be spread across multiple requests, with a "finished" flag on the last request. This might involve returning an "upload token" which subsequent requests must reference.
  • POST /storage/mut/SI/%(storage_index)s/shares/%(shnum)d: modify the given mutable share. The body will probably be a signed serialized JSON modification request, basically a write-vector, along with a test-vector or other collision-avoidance scheme.

All of this presumes that Accounting is not being enforced on read access. At least one of the designs I've drawn up offers read=False control, as a stick for the storage operator to use against a client who doesn't pay their bills (but still less drastic than store=False, which deletes all their data). To enforce read=False, the GETs would need to be authorized, which either involves adding an extra signature header, or implementing them with a POST instead (and putting the signature in the request body).

Change History (0)

Note: See TracTickets for help on using tickets.