#661 closed enhancement (duplicate)
Dynamic share migration to maintain file health
Reported by: | mmore | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | undecided |
Component: | code-encoding | Version: | 1.3.0 |
Keywords: | repair preservation availability | Cc: | |
Launchpad Bug: |
Description (last modified by daira)
Dynamic share repair to maintain file health. based on the following features already exist in Allmydata-Tahoe1.3 we can improve automatic repair:
- Foolscap provides the knowledge of the alive nodes.
- Verification of file availability can be delegated to other node through read-cap or a verify-cap without security risk.
The proposed auto repair process:
- Using memory-based algorithm, because client know where the file shares exist so we can keep tack of alive file shares, for simplicity we consider that share availability from its node availability.
- repair process triggered automatically from the repairer, repair responsibility has many technique based repair cost ; network bandwidth and fault tolerant.
- time out , we can use lazy repair technique to avoid node temporary node failure, i.e waiting for a certain time before repair process starts.
- reintegration, using memory-based repair technique remembering failed storage servers, who come back to life, will help in reducing Tahoe grid resources such as network bandwidth and storage space.
- repairer, selection of repair responsibly takes many issues into consideration: security , repairer location , repairer resources.
Change History (5)
comment:1 Changed at 2009-03-11T22:07:16Z by zooko
- Description modified (diff)
comment:2 Changed at 2009-03-12T21:03:22Z by warner
- Description modified (diff)
re-reformatted it: I think trac requires the leading space to trigger the "display as list" formatter
comment:3 Changed at 2009-06-12T00:56:32Z by warner
- Component changed from dev-infrastructure to code-encoding
- Owner somebody deleted
comment:4 Changed at 2010-03-25T03:27:24Z by davidsarah
- Keywords repair preservation availability added
The following clump of tickets are closely related:
- #450 Checker/repair agent
- #483 Repairer service
- #543 Rebalancing manager
- #643 Automatically schedule repair service
- #661 Dynamic share migration to maintain file health
- #864 Automated migration of shares between storage servers
Actually there are probably too many overlapping tickets here.
Part of the redundancy is due to distinguishing repair from rebalancing. But when #614 and #778 are fixed, a healthy file will by definition be balanced across servers, so there's no need to make that distinction. Perhaps there will also be a "super-healthy" status that means shares are balanced across the maximum number of servers, i.e. N. (When we support geographic dispersal / rack-awareness, the definitions of "healthy" and "super-healthy" will presumably change again so that they also imply that shares have the desired distribution.)
There are basically four options for how repair/rebalancing could be triggered:
- a webapi operation performed by a gateway, and triggered by CLI commands. We already have this. Scheduling this operation automatically is #643.
- triggered by write operations on a particular file. This is #232 and #699.
- moving a server's shares elsewhere when it is about to be decommissioned or is running out of space. This is #864.
- a more autonomous repair/rebalancing service that would run continuously.
The last option does not justify 4 tickets! (#450, #483, #543, #661) Unless anyone objects, I'm going to merge these all into #483 [edit: actually #543].
comment:5 Changed at 2014-12-29T20:21:04Z by daira
- Description modified (diff)
- Resolution set to duplicate
- Status changed from new to closed
Duplicate of #543.
I reformatted the original description so that trac will represent the numbered items as a list.