Changes between Initial Version and Version 1 of Ticket #671


Ignore:
Timestamp:
2009-11-30T21:43:47Z (15 years ago)
Author:
warner
Comment:

(updated description)

Note that any sizelimit code is allowed to speed things up by remembering state from one run to the next. The old code did the slow recursive-traversal sharewalk to handle the (important) case where this state was inaccurate or unavailable (i.e. when shares had been deleted by some external process, or to handle the local-fs-level overhead that accounts for the difference between what /bin/ls and /bin/df each report). But we could trade off accuracy for speed: it should be acceptable to just ensure that the sizelimit is eventually approximately correct.

A modern implementation should probably use the "share crawler" mechanism, doing a stat on each share, and adding up the results. It can store state in the normal crawler stash, probably in the form of a single total-bytes value per prefixdir. The do-I-have-space test should use max(last-pass, current-pass), to handle the fact that the current-pass value will be low while the prefixdir is being scanned. The crawler would replace this state on each pass, so any stale information would go away within a few hours or days.

Ideally, the server code should also keep track of new shares that were written into each prefixdir, and add the sizes of those shares to the state value, but only until the next crawler pass had swung by and seen the new shares. You'd also want do to something similar with shares that were deleted (by the lease expirer). To accomplish this, you'd want to make a ShareCrawler subclass that tracks this extra space in a per-prefixdir dict, and have the storage-server/lease-expirer notify it every time a share was created or deleted. The ShareCrawler subclass is in the right position to know when the crawler has reached a bucket.

Doing this with the crawler would also have the nice side-effect of balancing fast startup with accurate size limiting. Even though this ticket has been defined as not requiring such a feature, I'm sure users would appreciate it.

Legend:

Unmodified
Added
Removed
Modified
  • Ticket #671

    • Property Summary changed from sizelimit to bring back sizelimit (i.e. max consumed, not min free)
  • Ticket #671 – Description

    initial v1  
    11We used to have a {{{sizelimit}}} option which would do a recursive examination of the storage directory at startup and calculate approximately how much disk space was used, and refuse to accept new shares if the disk space would exceed the limit.  #34 shows when it was implemented.  It was later removed because it took a long time -- about 30 minutes -- on allmydata.com storage servers, and the servers remained unavailable to clients during this period, and because it was replaced by the {{{reserved_space}}} configuration, which was very fast and which satisfied the requirements of the allmydata.com storage servers.
    22
    3 This ticket is to reintroduce {{{sizelimit}}} because [http://allmydata.org/pipermail/tahoe-dev/2009-March/001493.html some users want it].  This will mean that the storage server doesn't start serving clients until it finishes the disk space inspection at startup.
     3This ticket is to reintroduce {{{sizelimit}}} because [http://allmydata.org/pipermail/tahoe-dev/2009-March/001493.html some users want it].  This might mean that the storage server doesn't start serving clients until it finishes the disk space inspection at startup.
     4
     5Note that {{{sizelimit}}} would impose a maximum limit on the amount of space consumed by the node's {{{storage/shares/}}} directory, whereas {{{reserved_space}}} imposes a minimum limit on the amount of remaining available disk space. In general, {{{reserved_space}}} can be implemented by asking the OS for filesystem stats, whereas {{{sizelimit}}} must be implemented by tracking the node's own usage and accumulating the sizes over time.
    46
    57To close this ticket, you do *not* need to implement some sort of interleaving of inspecting disk space and serving clients.