Opened at 2008-12-02T01:21:00Z
Last modified at 2016-04-02T09:30:08Z
#543 new enhancement
'rebalancing manager' — at Initial Version
Reported by: | warner | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | eventually |
Component: | code-storage | Version: | 1.2.0 |
Keywords: | performance repair | Cc: | tahoe-lafs.org@…, vladimir@… |
Launchpad Bug: |
Description
So, in doing a bunch of manual GC work over the last week, I'm starting to think about what a "rebalancing manager" service would look like.
The basic idea is that storage servers would give a central service access to some special facet, through which the manager could enumerate the shares present on each one. The manager would slowly cycle through the entire storage-index space (over the course of a month, I imagine), probably one prefixdir at a time.
It would ask all the storage servers about which shares they hold, and figure out which other servers also hold those shares (this query is an online version of the 'tahoe debug catalog-shares' CLI tool). Then it would make decisions about which shares ought to go where. There are two goals (sometimes competing). The first is to move shares closer to the start of the permuted peer-selection order, so that clients don't have to search as far to find them. The second is to smooth out disk usage among all servers (more by percentage than by absolute usage).
Once the manager works out the minimum-effort rearrangement, it will inform the two servers that they should move a share between them. The servers can then use a direct connection to copy the share to its new home and then delete the original. In grids without full bidirectional connectivity, the manager could conceivably act as a relay.
When a new (empty) disk is added to the grid, it will accumulate shares very slowly, and only get shares for new files (those which are created after the new node comes online). A rebalancing manager would make better use of the new disk: filling it with old shares too, thus freeing space on old servers so they can continue to participate in the grid (instead of being read-only).
There may be ways to perform this task without a central manager. For example, we could treat balancing as an aspect of repair, such that the repair process ought to include moving shares around to better places. In this approach, the client that performs a repair would also do rebalancing. It is not clear if the clients ought to have the same level of authority as a trusted repair-manager: for example, should clients have the ability to delete shares of immutable files? Making the clients drive the rebalancing process would insure that no effort is expended on unwanted files. On the other hand, 1) clients must then take an active interest in rebalancing, and 2) the load generated by rebalancing would be pretty choppy (a central manager could do it smoothly, over time, whereas a client would want to finish their repair/rebalancing pass as quickly as possible).
This will also interact with accounting. A privileged rebalancing manager could be given the authority to clone a share (account labels and all) to a new server, whereas a client performing rebalancing themselves would naturally be restricted to whatever storage that client was normally allowed to consume. I'm not sure whether this issue is significant or not.
On the implementation side, I'd expect the rebalancing-manager to be a Tahoe node (made with 'tahoe create rebalancer', or the like), which advertises itself via the introducer. Storage Servers would have a configuration setting that says "give rebalancing-authority to any rebalancer that is advertised with a signature from blesser key X". This would require each storage server to be configured with pubkey X, but would not require any changes on the balancer node when new storage servers are added.
It might also be a good idea to tell the rebalancer how many storage servers it should expect to see, so it can refrain from doing anything unless it's fully connected.
I'm also thinking that the enumerate-your-shares interface could be used to generate estimates of how many files are in the grid. The rebalancer (or some other node with similar enumeration authority, perhaps a stats-gatherer or disk-watcher) could query for all shares in the aa-ab prefix range, merge the responses from all servers, then multiply by the number of prefixes. If the servers could efficiently distinguish mutable shares from immutable shares, we could get estimates of both filetypes.