Opened at 2008-09-08T22:44:26Z
Last modified at 2023-03-24T19:30:32Z
#510 closed enhancement
use plain HTTP for storage server protocol? — at Initial Version
Reported by: | warner | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | HTTP Storage Protocol |
Component: | code-storage | Version: | 1.2.0 |
Keywords: | standards gsoc http leastauthority | Cc: | zooko, jeremy@…, peter@… |
Launchpad Bug: |
Description
Zooko told me about an idea: use plain HTTP for the storage server protocol, instead of foolscap. Here are some thoughts:
- it could make Tahoe easier to standardize: the spec wouldn't have to include foolscap too
- the description of the share format (all the hashes/signatures/etc) becomes the most important thing: most other aspects of the system can be inferred from this format (with peer selection being a significant omission)
- download is easy, use GET and a URL of /shares/STORAGEINDEX/SHNUM, perhaps with an HTTP Content-Range header if you only want a portion of the share
- upload for immutable files is easy: PUT /shares/SI/SHNUM, which works only once
- upload for mutable files:
- implement DSA-based mutable files, in which the storage index is the hash of the public key (or maybe even equal to the public key)
- the storage server is obligated to validate every bit of the share against the roothash, validate the roothash signature against the pubkey, and validate the pubkey against the storage index
- the storage server will accept any share that validates up to the SI and has a seqnum higher than any existing share
- if there is no existing share, the server will accept any valid share
- when using Content-Range: (in some one-message equivalent of writev), the server validates the resulting share, which is some combination of the existing share and the deltas being written. (this is for MDMF where we're trying to modify just one segment, plus the modified hash chains, root hash, and signature)
Switching to a validate-the-share scheme to control write access is good and bad:
- + repairers can create valid, readable, overwritable shares without access to the writecap.
- - storage servers must do a lot of hashing and public key computation on every upload
- - storage servers must know the format of the uploaded share, so clients cannot start using new formats without first upgrading all the storage servers
The result would be a share-transfer protocol that would look exactly like HTTP, however it could not be safely implemented by a simple HTTP server because the PUT requests must be constrained by validating the share. (a simple HTTP server doesn't really implement PUT anyways). There is a benefit to using "plain HTTP", but some of the benefit is lost when in fact it is really HTTP being used as an RPC mechanism (think of the way S3 uses HTTP).
It might be useful to have storage servers declare two separate interfaces: a plain HTTP interface for read, and a separate port or something for write. The read side could indeed be provided by a dumb HTTP server like apache; the write side would need something slightly more complicated. An apache module to provide the necessary share-write checking would be fairly straightforward, though.
Hm, that makes me curious about the potential to write the entire Tahoe node as an apache module: it could convert requests for /ROOT/uri/FILECAP etc into share requests and FEC decoding...