Changes between Initial Version and Version 5 of Ticket #359


Ignore:
Timestamp:
2009-12-26T03:52:24Z (15 years ago)
Author:
warner
Comment:

Note that zooko's recent comments are about immutable files and their shares, whereas this ticket is about mutable files and shares, which use a different layout. However the same general statements are true. Mutable files were designed after we had some experience with immutable files, but before I learned to always use 64-bit fields for everything. They've used somewhat larger offset fields since day 1, which are big enough to accomodate very large shares. The layout is described in source:src/allmydata/mutable/layout.py .

To be precise, they use 32-bit fields to hold the offsets of the signature, share_hash_chain, block_hash_tree, and share_data, then use a 64-bit field to hold the offset of the enc_privkey and EOF. So they can tolerate 264 bit share_data sections, which is where the bulk of the share's data lives. The block_hash_tree section is smaller than the share_data section, but still scales linearly with filesize. Because of the 32-bit field for offset[share_data], it must be somewhat shorter than 232 bytes, limiting it to 227 hashes, so 226 segments, which at our default 128KiB (217) segsize means 243 bytes, which is the limiting factor. By raising the segsize to e.g. 4MB (222) this limit grows to 248 bytes.

So, SDMF mutable files are limited by the share format to k*243 bytes, or about 24TiB. Until we implement MDMF and can process mutable files one segment at a time (instead of holding the whole file in RAM), we'll be soft-limited by available memory, so practically speaking the limit is a couple of GB.

If we stick with the same share format for MDMF (which was our goal: old clients should be able to keep using their SDMF code to read MDMF-generated files, unless we really do need a separate salt for each segment: #393), then MDMF files will be limited to k*243 bytes with a RAM footprint of about x*128KiB (where "x" is probably 2 or 3). An uploader-side max_segsize configuration change can scale those two values together up to a filesize limit of k*264 bytes and a RAM footprint of x*256GiB.

If we *do* change the share format for MDMF, then we should of course use 64-bit fields everywhere and remove this 243 limit.

Finally, it turns out that this ticket is actually a dupe of #694, which was closed when we removed the hard limit on SDMF files in db939750a8831c1e back in June 2009. I'd initially imposed the arbitrary 3.5MB limit to discourage people from using the (inefficient, memory-hungry) SDMF format in ways that would disappoint their hopes for high-performance behavior, but I was talked out of this and Kevan implemented the fix, which was first released in 1.5.0 .

Legend:

Unmodified
Added
Removed
Modified
  • Ticket #359

    • Property Status changed from new to closed
    • Property Component changed from code-encoding to code-mutable
    • Property Milestone changed from eventually to 1.5.0
    • Property Keywords memory added
    • Property Resolution changed from to duplicate
  • Ticket #359 – Description

    initial v5  
    551.  Creating or updating an SDMF would take approximately 1+N/K * filesize RAM.
    66
    7 2.  It would take approximately N/K * filesize upload bandwidth (or if you have an Upload Helper, just filesize upload bandwidth from you to the Helper, then N/K * filesize upload bandwidth from the helper to the storage servers).
     72.  It would take approximately N/K * filesize upload bandwidth to change even just one byte of the file. (if/when we implement a mutable upload helper, the client-to-helper bandwidth will be equal to the filesize).