Opened at 2008-12-02T01:21:00Z
Last modified at 2016-04-02T09:30:08Z
#543 new enhancement
repair/rebalancing service
Reported by: | warner | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | eventually |
Component: | code-storage | Version: | 1.2.0 |
Keywords: | performance repair | Cc: | tahoe-lafs.org@…, vladimir@… |
Launchpad Bug: |
Description (last modified by daira)
So, in doing a bunch of manual GC work over the last week, I'm starting to think about what a "rebalancing manager" service would look like.
The basic idea is that storage servers would give a central service access to some special facet, through which the manager could enumerate the shares present on each one. The manager would slowly cycle through the entire storage-index space (over the course of a month, I imagine), probably one prefixdir at a time.
It would ask all the storage servers about which shares they hold, and figure out which other servers also hold those shares (this query is an online version of the 'tahoe debug catalog-shares' CLI tool). Then it would make decisions about which shares ought to go where. There are two goals (sometimes competing). The first is to move shares closer to the start of the permuted peer-selection order, so that clients don't have to search as far to find them. The second is to smooth out disk usage among all servers (more by percentage than by absolute usage).
Once the manager works out the minimum-effort rearrangement, it will inform the two servers that they should move a share between them. The servers can then use a direct connection to copy the share to its new home and then delete the original. In grids without full bidirectional connectivity, the manager could conceivably act as a relay.
When a new (empty) disk is added to the grid, it will accumulate shares very slowly, and only get shares for new files (those which are created after the new node comes online). A rebalancing manager would make better use of the new disk: filling it with old shares too, thus freeing space on old servers so they can continue to participate in the grid (instead of being read-only).
There may be ways to perform this task without a central manager. For example, we could treat balancing as an aspect of repair, such that the repair process ought to include moving shares around to better places. In this approach, the client that performs a repair would also do rebalancing. It is not clear if the clients ought to have the same level of authority as a trusted repair-manager: for example, should clients have the ability to delete shares of immutable files? Making the clients drive the rebalancing process would insure that no effort is expended on unwanted files. On the other hand, 1) clients must then take an active interest in rebalancing, and 2) the load generated by rebalancing would be pretty choppy (a central manager could do it smoothly, over time, whereas a client would want to finish their repair/rebalancing pass as quickly as possible).
This will also interact with accounting. A privileged rebalancing manager could be given the authority to clone a share (account labels and all) to a new server, whereas a client performing rebalancing themselves would naturally be restricted to whatever storage that client was normally allowed to consume. I'm not sure whether this issue is significant or not.
On the implementation side, I'd expect the rebalancing-manager to be a Tahoe node (made with 'tahoe create rebalancer', or the like), which advertises itself via the introducer. Storage Servers would have a configuration setting that says "give rebalancing-authority to any rebalancer that is advertised with a signature from blesser key X". This would require each storage server to be configured with pubkey X, but would not require any changes on the balancer node when new storage servers are added.
It might also be a good idea to tell the rebalancer how many storage servers it should expect to see, so it can refrain from doing anything unless it's fully connected.
I'm also thinking that the enumerate-your-shares interface could be used to generate estimates of how many files are in the grid. The rebalancer (or some other node with similar enumeration authority, perhaps a stats-gatherer or disk-watcher) could query for all shares in the aa-ab prefix range, merge the responses from all servers, then multiply by the number of prefixes. If the servers could efficiently distinguish mutable shares from immutable shares, we could get estimates of both filetypes.
Change History (16)
comment:1 Changed at 2009-05-08T16:11:45Z by zooko
comment:2 Changed at 2009-08-10T15:28:41Z by zooko
The following clump of tickets might be of interest to people who are interested in this ticket: #711 (repair to different levels of M), #699 (optionally rebalance during repair or upload), #543 ('rebalancing manager'), #232 (Peer selection doesn't rebalance shares on overwrite of mutable file.), #678 (converge same file, same K, different M), #610 (upload should take better advantage of existing shares), #573 (Allow client to control which storage servers receive shares).
comment:3 Changed at 2009-08-10T15:45:38Z by zooko
Also related: #778 ("shares of happiness" is the wrong measure; "servers of happiness" is better).
comment:4 Changed at 2009-12-04T04:58:36Z by davidsarah
- Component changed from code-performance to code-storage
- Keywords performance added
comment:5 Changed at 2009-12-23T20:06:30Z by davidsarah
- Keywords repair added
comment:6 Changed at 2010-05-16T05:14:33Z by zooko
- Milestone changed from undecided to soon (release n/a)
#778 ("shares of happiness" is the wrong measure; "servers of happiness" is better) is fixed!
comment:7 Changed at 2010-06-12T23:43:39Z by davidsarah
- Milestone changed from soon (release n/a) to soon
comment:8 Changed at 2010-12-12T23:29:25Z by davidsarah
If a repair operation were to also rebalance shares (as of v1.8.1 it does not), then #483 (repairer service) would be a duplicate of this ticket. So we should close #483 as a duplicate iff we decide that repair should rebalance.
(These are not to be confused with #643, which is a less ambitious ticket about scheduling the existing deep-checker/repairer using a cron job or the Windows scheduler.)
comment:9 Changed at 2010-12-12T23:31:48Z by davidsarah
- Type changed from task to enhancement
comment:10 Changed at 2010-12-15T20:50:17Z by davidsarah
Shu Lin suggested on tahoe-dev that one criterion for rebalancing should be that a file has been recently accessed.
This could have some disadvantages for debugging, since reads would have a side effect, so I think it would have to be optional. It is probably better to add files to a rebalancing queue (and keep track of how recently they were last rebalanced) rather than repairing/rebalancing them immediately after the read, because the latter would do unnecessary work if a file is read several times.
comment:11 Changed at 2013-07-03T09:39:00Z by daira
- Description modified (diff)
#483 (repairer service) was closed as a duplicate.
comment:12 Changed at 2013-07-03T09:40:10Z by daira
- Milestone changed from soon to eventually
comment:13 Changed at 2013-07-03T09:45:22Z by daira
- Summary changed from 'rebalancing manager' to repair/rebalancing service
comment:14 Changed at 2014-12-29T20:22:36Z by daira
#661 was a duplicate.
comment:15 Changed at 2015-02-02T11:06:38Z by lpirl
- Cc tahoe-lafs.org@… added
comment:16 Changed at 2016-04-02T09:30:08Z by rvs
- Cc vladimir@… added
See also #699 (optionally rebalance during repair or upload).