The cloud backend, which uses HTTP or HTTPS to connect to the cloud storage service, provides some interesting data on how an HTTP-only storage protocol might perform. With request pipelining and connection pooling, it seems to do a pretty good job of maxing out the upstream bandwidth to the cloud on my home Internet connection, although it would be interesting to test it with a fatter pipe. (For downloads, performance appears to be limited by inefficiencies in the downloader rather than in the cloud backend.)
Currently, the cloud backend splits shares into "chunks" to limit the amount of data that needs to be held in memory or in a store object (see docs/specifications/backends/raic.rst). This is somewhat redundant with segmentation: ciphertext "segments" are erasure-encoded into "blocks" (a segment is k = shares.needed times larger than a block), and stored in a share together with a header and metadata, which is then chunked. Blocks and chunks are not aligned (for two reasons: the share header, and the typical block size of 128 KiB / 3, which is not a factor of the 512 KiB default chunk size). So,
- a sequential scan over blocks will reference the same chunk for several (typically about 12 for k = 3) consecutive requests.
- a single block may span chunks.
- writes not aligned with a chunk must be implemented using read-modify-write.
The cloud backend uses caching to mitigate any resulting inefficiency. However, this is only of limited help because the storage client lacks information about where the chunk boundaries are and the behaviour of the chunk cache, and the storage server lacks information about the access patterns of the uploader or downloader.
A possible performance improvement and simplification that I'm quite enthusiastic about for an HTTP-based protocol is to make blocks the same thing as chunks. That is, the segment size would be k times the chunk size, and the uploader or downloader would directly store or request chunks, rather than blocks, from the backend storage, doing any caching itself.