#1382 closed enhancement

immutable peer selection refactoring and enhancements — at Version 15

Reported by: kevan Owned by: davidsarah
Priority: major Milestone: 1.13.0
Component: code-peerselection Version: 1.8.2
Keywords: servers-of-happiness blocks-release Cc: zooko, mjberger@…, kyle@…, peter@…, vladimir@…
Launchpad Bug:

Description (last modified by zooko)

I've been working on refactoring and improving immutable peer selection. I have several immediate goals for this project.

  • Decouple servers of happiness specific peer selection from the more mechanical aspects of peer selection so that it can be easily integrated into SDMF and MDMF mutable files now, and other formats later;
  • Address the shortcomings in the servers of happiness file health measure and its current implementation;
  • Extend servers of happiness to file check, verify, and repair operations;
  • Improve the quality and clarity of the peer selection code, easing future maintenance and experimentation.

These improvements correspond roughly to issues presented in tickets #610, #614, #1115, #1124, #1130, and #1293. Unifying mutable and immutable peer selection is ticket #1057, though I don't expect to address that until after MDMF (#393) is merged and until we're done with this ticket.

Change History (18)

comment:1 Changed at 2011-03-29T04:05:29Z by kevan

First up is a draft of of IPeerSelector, an abstract peer selection class. The draft is contained in attachment:interface.dpatch.

A significant annoyance with servers of happiness (and a source of many of its shortcomings) is the fact that it wasn't used to motivate share placement or server selection in a well thought out way; rather, it was just used to evaluate the job that the existing share placement strategy did. So, with IPeerSelector (and with the implicit assumption that the peer selection and share placement strategies you'd want to use when uploading a file are probably generally motivated by how you define file health), I wanted to combine both file health and share placement in one abstraction. I still have a few small issues to fix with the uploader changes that actually use the interface, so you'll have to wait to see the code that exercises and implements the interface, but the interaction is basically:

  • Initialize upload, upload initializes peer selector.
  • Upload tries to find existing shares as appropriate. For each existing share:
    • Call add_peer_with_share to tell the peer selector about that relationship.
  • Upload tells the peer selector all of the peers that it can write to (the peer selector doesn't assume that it can write to the peers it is told about for existing shares).
  • do:
    • Upload asks the peer selector for share assignments, checks their health. If unhealthy, the upload fails with an error as appropriate (so we can account for file repair edge cases here).
    • Upload tries to allocate buckets for the assigned share placements, reporting to the peer selector its successes (via add_peer_with_share) and failures (via mark_full_peer and mark_bad_peer) as appropriate.
  • while the peer selector needs recomputation.

I have an immutable uploader that works pretty well with this sort of interaction.

Things to look for if you're reviewing:

  1. Leaky abstractions; particularly leaky method names. I tried to avoid leaking graph theory into my interface definition. Did I succeed? If not, what should be improved?
  2. Does this seem like a generally useful interface? Could you imagine plugging it into an existing uploader without too much trouble? Could you imagine writing a new uploader that used it without too much trouble? How about a peer selector for an entirely different upload health metric?

(of course, you're welcome to look for whatever you want -- that's just what I think I need feedback on :-)

Changed at 2011-03-29T04:06:09Z by kevan

initial implementation of IPeerSelector interface

comment:2 Changed at 2011-03-29T04:22:37Z by kevan

  • Keywords design-review-needed added

comment:3 Changed at 2011-04-23T19:57:49Z by davidsarah

  • Owner changed from kevan to davidsarah
  • Status changed from new to assigned

Changed at 2011-04-24T02:38:27Z by kevan

update IPeerSelector definition

comment:4 Changed at 2011-04-24T04:08:03Z by zooko

  • Milestone changed from undecided to 1.9.0

Changed at 2011-05-03T04:08:16Z by kevan

snapshot of #1382; includes new, simplified uploader, refactored server selection logic, tests

comment:5 Changed at 2011-07-16T20:27:48Z by zooko

  • Milestone changed from 1.9.0 to soon

On the Tahoe-LAFS Weekly Call, we agreed that this isn't likely to be ready for 1.9.0.

comment:6 Changed at 2011-07-16T20:33:07Z by zooko

Kevan says there isn't any specific thing that he knows is wrong with this patch, but he thinks of it as an experiment that would need more consideration and hunting for edge cases before it should go into trunk.

comment:7 Changed at 2011-07-22T13:41:48Z by zooko

There are a few other tickets that may be affected by this one: #362, #1394, #873, #610, at least.

comment:8 Changed at 2011-11-17T03:50:06Z by kevan

We discussed #1382 at the second Tahoe-LAFS summit. We concluded that this approach (splitting peer selection into two parts along a well-defined and generally useful interface) was a good approach, and that a more complete and correct share allocation algorithm (share director? share overlord? I don't think we agreed on a name for the object) was a good idea. I think we concluded that the right path to bring #1382 into trunk was something like:

  • Split the current peer selection/share allocation algorithm out into a separate class, implementing the interface described in this ticket. I guess we won't have any functional changes at this step, but it will help us design an interface that will be useful in general.
  • Solicit feedback on the interface.
  • Develop an implementation of the revised share placement algorithm using the interface that meets Brian's performance desiderata (200 shares, 1000 servers, 1 second).

Once that's done, we'll have an easy way for people to add new share placement algorithms to Tahoe-LAFS, and a share placement algorithm that fixes the corner cases users currently experience with servers of happiness.

I should have more free time in December than I've had lately, so I'll be happy to work on #1382. I don't know if we want to plan on #1382 for 1.10, or defer it until a future release; if we discussed that at the summit, I don't remember what we concluded.

comment:9 Changed at 2011-11-17T06:12:43Z by zooko

Kevan: excellent! I'm looking forward to progress on this. Your summary of the Summit discussion matches my recollections. We did not talk about whether it would go into 1.10 or not. Lets decide later. I don't even know who is going to be Release Wrangler for 1.10. (Hopefully Brian. He had some good ideas about "more automation, less process".)

comment:10 Changed at 2011-11-17T06:18:47Z by zooko

  • Cc zooko added

comment:11 Changed at 2012-01-14T22:57:55Z by kevan

You can follow my progress on this project by watching the ticket1382 branch on my GitHub page. I'm trying to assemble a series of small deltas that take us from our current immutable peer selector to something that can be easily modified to use a pluggable peer selector, and would be particularly interested in knowing whether the deltas that are there now (and show up there as I continue to work on it) are small enough to be easily reviewed.

comment:12 Changed at 2012-03-29T22:24:23Z by davidsarah

  • Milestone changed from soon to 1.10.0

As Release Manager for 1.10, I say this goes in.

comment:13 Changed at 2012-04-01T02:16:24Z by zooko

  • Keywords review-needed added
  • Milestone changed from 1.11.0 to soon

comment:14 Changed at 2012-04-01T02:43:33Z by davidsarah

  • Milestone changed from soon to 1.11.0

(I effectively renamed 1.10 to 1.11, so that milestone is correct.)

comment:15 Changed at 2013-05-15T17:20:05Z by zooko

  • Description modified (diff)
  • Milestone changed from 1.11.0 to eventually

I'm moving this to Milestone: "eventually", which means that we agree it ought to be fixed, but we don't agree that it is going to get fixed in 1.11.

If someone (e.g. kevan or markberger) wants to prioritize this, then I suppose they might finish it in time for Milestone 1.11. In that case they can move it back into this milestone when they start working on it.

Note: See TracTickets for help on using tickets.