Ticket #778: behavior2.txt

File behavior2.txt, 95.2 KB (added by kevan, at 2010-05-07T22:34:07Z)
Line 
1Wed Sep 23 21:19:32 PDT 2009  Kevan Carstensen <kevan@isnotajoke.com>
2  * Alter CiphertextDownloader to work with servers_of_happiness
3
4Tue Nov  3 19:32:41 PST 2009  Kevan Carstensen <kevan@isnotajoke.com>
5  * Alter the signature of set_shareholders in IEncoder to add a 'servermap' parameter, which gives IEncoders enough information to perform a sane check for servers_of_happiness.
6
7Wed Nov  4 03:12:22 PST 2009  Kevan Carstensen <kevan@isnotajoke.com>
8  * Alter 'immutable/encode.py' and 'immutable/upload.py' to use servers_of_happiness instead of shares_of_happiness.
9
10Mon Nov 16 11:28:05 PST 2009  Kevan Carstensen <kevan@isnotajoke.com>
11  * Alter Tahoe2PeerSelector to make sure that it recognizes existing shares on readonly servers, fixing an issue in #778
12
13Mon Nov 16 13:24:59 PST 2009  Kevan Carstensen <kevan@isnotajoke.com>
14  * Change stray "shares_of_happiness" to "servers_of_happiness"
15
16Tue Nov 17 17:45:42 PST 2009  Kevan Carstensen <kevan@isnotajoke.com>
17  * Eliminate overcounting iof servers_of_happiness in Tahoe2PeerSelector; also reorganize some things.
18
19Sun Nov 22 16:24:05 PST 2009  Kevan Carstensen <kevan@isnotajoke.com>
20  * Alter the error message returned when peer selection fails
21 
22  The Tahoe2PeerSelector returned either NoSharesError or NotEnoughSharesError
23  for a variety of error conditions that weren't informatively described by them.
24  This patch creates a new error, UploadHappinessError, replaces uses of
25  NoSharesError and NotEnoughSharesError with it, and alters the error message
26  raised with the errors to be more in line with the new servers_of_happiness
27  behavior. See ticket #834 for more information.
28
29Fri Dec  4 20:30:37 PST 2009  Kevan Carstensen <kevan@isnotajoke.com>
30  * Change "UploadHappinessError" to "UploadUnhappinessError"
31
32Wed Dec 30 13:03:44 PST 2009  Kevan Carstensen <kevan@isnotajoke.com>
33  * Alter the error message when an upload fails, per some comments in #778.
34 
35  When I first implemented #778, I just altered the error messages to refer to
36  servers where they referred to shares. The resulting error messages weren't
37  very good. These are a bit better.
38
39Fri Dec  4 19:40:05 PST 2009  "Kevan Carstensen" <kevan@isnotajoke.com>
40  * Alter wording in 'interfaces.py' to be correct wrt #778
41
42Fri May  7 15:11:47 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
43  * Fix up the behavior of #778, per reviewers' comments
44 
45    - Make some important utility functions clearer and more thoroughly
46      documented.
47    - Assert in upload.servers_of_happiness that the buckets attributes
48      of PeerTrackers passed to it are mutually disjoint.
49    - Get rid of some silly non-Pythonisms that I didn't see when I first
50      wrote these patches.
51    - Make sure that should_add_server returns true when queried about a
52      shnum that it doesn't know about yet.
53    - Change Tahoe2PeerSelector.preexisting_shares to map a shareid to a set
54      of peerids, alter dependencies to deal with that.
55    - Remove upload.should_add_servers, because it is no longer necessary
56    - Move upload.shares_of_happiness and upload.shares_by_server to a utility
57      file.
58    - Change some points in Tahoe2PeerSelector.
59    - Compute servers_of_happiness using a bipartite matching algorithm that
60      we know is optimal instead of an ad-hoc greedy algorithm that isn't.
61    - Change servers_of_happiness to just take a sharemap as an argument,
62      change its callers to merge existing_shares and used_peers before
63      calling it.
64    - Change an error message in the encoder to be more appropriate for
65      servers of happiness.
66    - Clarify the wording of an error message in immutable/upload.py
67    - Refactor a happiness failure message to happinessutil.py, and make
68      immutable/upload.py and immutable/encode.py use it.
69 
70
71New patches:
72
73[Alter CiphertextDownloader to work with servers_of_happiness
74Kevan Carstensen <kevan@isnotajoke.com>**20090924041932
75 Ignore-this: e81edccf0308c2d3bedbc4cf217da197
76] hunk ./src/allmydata/immutable/download.py 1039
77             # Repairer (uploader) needs the encodingparams.
78             self._target.set_encodingparams((
79                 self._verifycap.needed_shares,
80-                self._verifycap.total_shares, # I don't think the target actually cares about "happy".
81+                0, # see ticket #778 for why this is
82                 self._verifycap.total_shares,
83                 self._vup.segment_size
84                 ))
85[Alter the signature of set_shareholders in IEncoder to add a 'servermap' parameter, which gives IEncoders enough information to perform a sane check for servers_of_happiness.
86Kevan Carstensen <kevan@isnotajoke.com>**20091104033241
87 Ignore-this: b3a6649a8ac66431beca1026a31fed94
88] {
89hunk ./src/allmydata/interfaces.py 1341
90         Once this is called, set_size() and set_params() may not be called.
91         """
92 
93-    def set_shareholders(shareholders):
94+    def set_shareholders(shareholders, servermap):
95         """Tell the encoder where to put the encoded shares. 'shareholders'
96         must be a dictionary that maps share number (an integer ranging from
97hunk ./src/allmydata/interfaces.py 1344
98-        0 to n-1) to an instance that provides IStorageBucketWriter. This
99-        must be performed before start() can be called."""
100+        0 to n-1) to an instance that provides IStorageBucketWriter.
101+        'servermap' is a dictionary that maps share number (as defined above)
102+        to a peerid. This must be performed before start() can be called."""
103 
104     def start():
105         """Begin the encode/upload process. This involves reading encrypted
106}
107[Alter 'immutable/encode.py' and 'immutable/upload.py' to use servers_of_happiness instead of shares_of_happiness.
108Kevan Carstensen <kevan@isnotajoke.com>**20091104111222
109 Ignore-this: abb3283314820a8bbf9b5d0cbfbb57c8
110] {
111hunk ./src/allmydata/immutable/encode.py 121
112         assert not self._codec
113         k, happy, n, segsize = params
114         self.required_shares = k
115-        self.shares_of_happiness = happy
116+        self.servers_of_happiness = happy
117         self.num_shares = n
118         self.segment_size = segsize
119         self.log("got encoding parameters: %d/%d/%d %d" % (k,happy,n, segsize))
120hunk ./src/allmydata/immutable/encode.py 179
121         if name == "storage_index":
122             return self._storage_index
123         elif name == "share_counts":
124-            return (self.required_shares, self.shares_of_happiness,
125+            return (self.required_shares, self.servers_of_happiness,
126                     self.num_shares)
127         elif name == "num_segments":
128             return self.num_segments
129hunk ./src/allmydata/immutable/encode.py 194
130         else:
131             raise KeyError("unknown parameter name '%s'" % name)
132 
133-    def set_shareholders(self, landlords):
134+    def set_shareholders(self, landlords, servermap):
135         assert isinstance(landlords, dict)
136         for k in landlords:
137             assert IStorageBucketWriter.providedBy(landlords[k])
138hunk ./src/allmydata/immutable/encode.py 199
139         self.landlords = landlords.copy()
140+        assert isinstance(servermap, dict)
141+        self.servermap = servermap.copy()
142 
143     def start(self):
144         """ Returns a Deferred that will fire with the verify cap (an instance of
145hunk ./src/allmydata/immutable/encode.py 491
146             # even more UNUSUAL
147             self.log("they weren't in our list of landlords", parent=ln,
148                      level=log.WEIRD, umid="TQGFRw")
149-        if len(self.landlords) < self.shares_of_happiness:
150-            msg = "lost too many shareholders during upload (still have %d, want %d): %s" % \
151-                  (len(self.landlords), self.shares_of_happiness, why)
152-            if self.landlords:
153+        del(self.servermap[shareid])
154+        servers_left = list(set(self.servermap.values()))
155+        if len(servers_left) < self.servers_of_happiness:
156+            msg = "lost too many servers during upload (still have %d, want %d): %s" % \
157+                  (len(servers_left),
158+                   self.servers_of_happiness, why)
159+            if servers_left:
160                 raise NotEnoughSharesError(msg)
161             else:
162                 raise NoSharesError(msg)
163hunk ./src/allmydata/immutable/encode.py 502
164         self.log("but we can still continue with %s shares, we'll be happy "
165-                 "with at least %s" % (len(self.landlords),
166-                                       self.shares_of_happiness),
167+                 "with at least %s" % (len(servers_left),
168+                                       self.servers_of_happiness),
169                  parent=ln)
170 
171     def _gather_responses(self, dl):
172hunk ./src/allmydata/immutable/upload.py 131
173         self.buckets.update(b)
174         return (alreadygot, set(b.keys()))
175 
176+def servers_with_shares(existing_shares, used_peers=None):
177+    servers = []
178+    if used_peers:
179+        peers = list(used_peers.copy())
180+        # We do this because the preexisting shares list goes by peerid.
181+        peers = [x.peerid for x in peers]
182+        servers.extend(peers)
183+    servers.extend(existing_shares.values())
184+    return list(set(servers))
185+
186+def shares_by_server(existing_shares):
187+    servers = {}
188+    for server in set(existing_shares.values()):
189+        servers[server] = set([x for x in existing_shares.keys()
190+                               if existing_shares[x] == server])
191+    return servers
192+
193 class Tahoe2PeerSelector:
194 
195     def __init__(self, upload_id, logparent=None, upload_status=None):
196hunk ./src/allmydata/immutable/upload.py 164
197 
198     def get_shareholders(self, storage_broker, secret_holder,
199                          storage_index, share_size, block_size,
200-                         num_segments, total_shares, shares_of_happiness):
201+                         num_segments, total_shares, servers_of_happiness):
202         """
203         @return: (used_peers, already_peers), where used_peers is a set of
204                  PeerTracker instances that have agreed to hold some shares
205hunk ./src/allmydata/immutable/upload.py 177
206             self._status.set_status("Contacting Peers..")
207 
208         self.total_shares = total_shares
209-        self.shares_of_happiness = shares_of_happiness
210+        self.servers_of_happiness = servers_of_happiness
211 
212         self.homeless_shares = range(total_shares)
213         # self.uncontacted_peers = list() # peers we haven't asked yet
214hunk ./src/allmydata/immutable/upload.py 242
215         d = defer.maybeDeferred(self._loop)
216         return d
217 
218+
219     def _loop(self):
220         if not self.homeless_shares:
221hunk ./src/allmydata/immutable/upload.py 245
222-            # all done
223-            msg = ("placed all %d shares, "
224-                   "sent %d queries to %d peers, "
225-                   "%d queries placed some shares, %d placed none, "
226-                   "got %d errors" %
227-                   (self.total_shares,
228-                    self.query_count, self.num_peers_contacted,
229-                    self.good_query_count, self.bad_query_count,
230-                    self.error_count))
231-            log.msg("peer selection successful for %s: %s" % (self, msg),
232+            effective_happiness = servers_with_shares(
233+                                                   self.preexisting_shares,
234+                                                   self.use_peers)
235+            if self.servers_of_happiness <= len(effective_happiness):
236+                msg = ("placed all %d shares, "
237+                       "sent %d queries to %d peers, "
238+                       "%d queries placed some shares, %d placed none, "
239+                       "got %d errors" %
240+                       (self.total_shares,
241+                        self.query_count, self.num_peers_contacted,
242+                        self.good_query_count, self.bad_query_count,
243+                        self.error_count))
244+                log.msg("peer selection successful for %s: %s" % (self, msg),
245                     parent=self._log_parent)
246hunk ./src/allmydata/immutable/upload.py 259
247-            return (self.use_peers, self.preexisting_shares)
248+                return (self.use_peers, self.preexisting_shares)
249+            else:
250+                delta = self.servers_of_happiness - len(effective_happiness)
251+                shares = shares_by_server(self.preexisting_shares)
252+                # Each server in shares maps to a set of shares stored on it.
253+                # Since we want to keep at least one share on each server
254+                # that has one (otherwise we'd only be making
255+                # the situation worse by removing distinct servers),
256+                # each server has len(its shares) - 1 to spread around.
257+                shares_to_spread = sum([len(list(sharelist)) - 1
258+                                        for (server, sharelist)
259+                                        in shares.items()])
260+                if delta <= len(self.uncontacted_peers) and \
261+                   shares_to_spread >= delta:
262+                    # Loop through the allocated shares, removing
263+                    items = shares.items()
264+                    while len(self.homeless_shares) < delta:
265+                        servernum, sharelist = items.pop()
266+                        if len(sharelist) > 1:
267+                            share = sharelist.pop()
268+                            self.homeless_shares.append(share)
269+                            del(self.preexisting_shares[share])
270+                            items.append((servernum, sharelist))
271+                    return self._loop()
272+                else:
273+                    raise NotEnoughSharesError("shares could only be placed on %d "
274+                                            "servers (%d were requested)" %
275+                                            (len(effective_happiness),
276+                                             self.servers_of_happiness))
277 
278         if self.uncontacted_peers:
279             peer = self.uncontacted_peers.pop(0)
280hunk ./src/allmydata/immutable/upload.py 336
281         else:
282             # no more peers. If we haven't placed enough shares, we fail.
283             placed_shares = self.total_shares - len(self.homeless_shares)
284-            if placed_shares < self.shares_of_happiness:
285+            effective_happiness = servers_with_shares(
286+                                                   self.preexisting_shares,
287+                                                   self.use_peers)
288+            if len(effective_happiness) < self.servers_of_happiness:
289                 msg = ("placed %d shares out of %d total (%d homeless), "
290hunk ./src/allmydata/immutable/upload.py 341
291-                       "want to place %d, "
292+                       "want to place on %d servers, "
293                        "sent %d queries to %d peers, "
294                        "%d queries placed some shares, %d placed none, "
295                        "got %d errors" %
296hunk ./src/allmydata/immutable/upload.py 347
297                        (self.total_shares - len(self.homeless_shares),
298                         self.total_shares, len(self.homeless_shares),
299-                        self.shares_of_happiness,
300+                        self.servers_of_happiness,
301                         self.query_count, self.num_peers_contacted,
302                         self.good_query_count, self.bad_query_count,
303                         self.error_count))
304hunk ./src/allmydata/immutable/upload.py 394
305                     level=log.NOISY, parent=self._log_parent)
306             progress = False
307             for s in alreadygot:
308+                if self.preexisting_shares.has_key(s):
309+                    old_size = len(servers_with_shares(self.preexisting_shares))
310+                    new_candidate = self.preexisting_shares.copy()
311+                    new_candidate[s] = peer.peerid
312+                    new_size = len(servers_with_shares(new_candidate))
313+                    if old_size >= new_size: continue
314                 self.preexisting_shares[s] = peer.peerid
315                 if s in self.homeless_shares:
316                     self.homeless_shares.remove(s)
317hunk ./src/allmydata/immutable/upload.py 825
318         for peer in used_peers:
319             assert isinstance(peer, PeerTracker)
320         buckets = {}
321+        servermap = already_peers.copy()
322         for peer in used_peers:
323             buckets.update(peer.buckets)
324             for shnum in peer.buckets:
325hunk ./src/allmydata/immutable/upload.py 830
326                 self._peer_trackers[shnum] = peer
327+                servermap[shnum] = peer.peerid
328         assert len(buckets) == sum([len(peer.buckets) for peer in used_peers])
329hunk ./src/allmydata/immutable/upload.py 832
330-        encoder.set_shareholders(buckets)
331+        encoder.set_shareholders(buckets, servermap)
332 
333     def _encrypted_done(self, verifycap):
334         """ Returns a Deferred that will fire with the UploadResults instance. """
335replace ./src/allmydata/immutable/upload.py [A-Za-z_0-9] _servers_with_shares _servers_with_unique_shares
336replace ./src/allmydata/immutable/upload.py [A-Za-z_0-9] servers_with_shares servers_with_unique_shares
337}
338[Alter Tahoe2PeerSelector to make sure that it recognizes existing shares on readonly servers, fixing an issue in #778
339Kevan Carstensen <kevan@isnotajoke.com>**20091116192805
340 Ignore-this: 15289f4d709e03851ed0587b286fd955
341] {
342hunk ./src/allmydata/immutable/upload.py 117
343         d.addCallback(self._got_reply)
344         return d
345 
346+    def query_allocated(self):
347+        d = self._storageserver.callRemote("get_buckets",
348+                                           self.storage_index)
349+        d.addCallback(self._got_allocate_reply)
350+        return d
351+
352+    def _got_allocate_reply(self, buckets):
353+        return (self.peerid, buckets)
354+
355     def _got_reply(self, (alreadygot, buckets)):
356         #log.msg("%s._got_reply(%s)" % (self, (alreadygot, buckets)))
357         b = {}
358hunk ./src/allmydata/immutable/upload.py 195
359         self._started_second_pass = False
360         self.use_peers = set() # PeerTrackers that have shares assigned to them
361         self.preexisting_shares = {} # sharenum -> peerid holding the share
362+        # We don't try to allocate shares to these servers, since they've
363+        # said that they're incapable of storing shares of the size that
364+        # we'd want to store. We keep them around because they may have
365+        # existing shares for this storage index, which we want to know
366+        # about for accurate servers_of_happiness accounting
367+        self.readonly_peers = []
368 
369         peers = storage_broker.get_servers_for_index(storage_index)
370         if not peers:
371hunk ./src/allmydata/immutable/upload.py 227
372             (peerid, conn) = peer
373             v1 = conn.version["http://allmydata.org/tahoe/protocols/storage/v1"]
374             return v1["maximum-immutable-share-size"]
375-        peers = [peer for peer in peers
376-                 if _get_maxsize(peer) >= allocated_size]
377-        if not peers:
378-            raise NoServersError("no peers could accept an allocated_size of %d" % allocated_size)
379+        new_peers = [peer for peer in peers
380+                     if _get_maxsize(peer) >= allocated_size]
381+        old_peers = list(set(peers).difference(set(new_peers)))
382+        peers = new_peers
383 
384         # decide upon the renewal/cancel secrets, to include them in the
385         # allocate_buckets query.
386hunk ./src/allmydata/immutable/upload.py 241
387                                                        storage_index)
388         file_cancel_secret = file_cancel_secret_hash(client_cancel_secret,
389                                                      storage_index)
390-
391-        trackers = [ PeerTracker(peerid, conn,
392-                                 share_size, block_size,
393-                                 num_segments, num_share_hashes,
394-                                 storage_index,
395-                                 bucket_renewal_secret_hash(file_renewal_secret,
396-                                                            peerid),
397-                                 bucket_cancel_secret_hash(file_cancel_secret,
398+        def _make_trackers(peers):
399+           return [ PeerTracker(peerid, conn,
400+                                share_size, block_size,
401+                                num_segments, num_share_hashes,
402+                                storage_index,
403+                                bucket_renewal_secret_hash(file_renewal_secret,
404                                                            peerid),
405hunk ./src/allmydata/immutable/upload.py 248
406-                                 )
407-                     for (peerid, conn) in peers ]
408-        self.uncontacted_peers = trackers
409-
410-        d = defer.maybeDeferred(self._loop)
411+                                bucket_cancel_secret_hash(file_cancel_secret,
412+                                                          peerid))
413+                    for (peerid, conn) in peers]
414+        self.uncontacted_peers = _make_trackers(peers)
415+        self.readonly_peers = _make_trackers(old_peers)
416+        # Talk to the readonly servers to get an idea of what servers
417+        # have what shares (if any) for this storage index
418+        d = defer.maybeDeferred(self._existing_shares)
419+        d.addCallback(lambda ign: self._loop())
420         return d
421 
422hunk ./src/allmydata/immutable/upload.py 259
423+    def _existing_shares(self):
424+        if self.readonly_peers:
425+            peer = self.readonly_peers.pop()
426+            assert isinstance(peer, PeerTracker)
427+            d = peer.query_allocated()
428+            d.addCallback(self._handle_allocate_response)
429+            return d
430+
431+    def _handle_allocate_response(self, (peer, buckets)):
432+        for bucket in buckets:
433+            self.preexisting_shares[bucket] = peer
434+            if self.homeless_shares:
435+                self.homeless_shares.remove(bucket)
436+        return self._existing_shares()
437 
438     def _loop(self):
439         if not self.homeless_shares:
440}
441[Change stray "shares_of_happiness" to "servers_of_happiness"
442Kevan Carstensen <kevan@isnotajoke.com>**20091116212459
443 Ignore-this: 1c971ba8c3c4d2e7ba9f020577b28b73
444] {
445hunk ./docs/architecture.txt 183
446 place a quantity known as "shares of happiness", we'll do the upload anyways.
447 If we cannot place at least this many, the upload is declared a failure.
448 
449-The current defaults use k=3, shares_of_happiness=7, and N=10, meaning that
450+The current defaults use k=3, servers_of_happiness=7, and N=10, meaning that
451 we'll try to place 10 shares, we'll be happy if we can place 7, and we need
452 to get back any 3 to recover the file. This results in a 3.3x expansion
453 factor. In general, you should set N about equal to the number of nodes in
454hunk ./src/allmydata/immutable/upload.py 411
455                 pass
456             else:
457                 # No more peers, so this upload might fail (it depends upon
458-                # whether we've hit shares_of_happiness or not). Log the last
459+                # whether we've hit servers_of_happiness or not). Log the last
460                 # failure we got: if a coding error causes all peers to fail
461                 # in the same way, this allows the common failure to be seen
462                 # by the uploader and should help with debugging
463hunk ./src/allmydata/interfaces.py 809
464 
465 class NotEnoughSharesError(Exception):
466     """Download was unable to get enough shares, or upload was unable to
467-    place 'shares_of_happiness' shares."""
468+    place 'servers_of_happiness' shares."""
469 
470 class NoSharesError(Exception):
471     """Upload or Download was unable to get any shares at all."""
472hunk ./src/allmydata/interfaces.py 1308
473                          pushed.
474 
475         'share_counts': return a tuple describing how many shares are used:
476-                        (needed_shares, shares_of_happiness, total_shares)
477+                        (needed_shares, servers_of_happiness, total_shares)
478 
479         'num_segments': return an int with the number of segments that
480                         will be encoded.
481hunk ./src/allmydata/test/test_encode.py 768
482     def test_lost_one_shareholder(self):
483         # we have enough shareholders when we start, but one segment in we
484         # lose one of them. The upload should still succeed, as long as we
485-        # still have 'shares_of_happiness' peers left.
486+        # still have 'servers_of_happiness' peers left.
487         modemap = dict([(i, "good") for i in range(9)] +
488                        [(i, "lost") for i in range(9, 10)])
489         return self.send_and_recover((4,8,10), bucket_modes=modemap)
490hunk ./src/allmydata/test/test_encode.py 776
491     def test_lost_one_shareholder_early(self):
492         # we have enough shareholders when we choose peers, but just before
493         # we send the 'start' message, we lose one of them. The upload should
494-        # still succeed, as long as we still have 'shares_of_happiness' peers
495+        # still succeed, as long as we still have 'servers_of_happiness' peers
496         # left.
497         modemap = dict([(i, "good") for i in range(9)] +
498                        [(i, "lost-early") for i in range(9, 10)])
499}
500[Eliminate overcounting iof servers_of_happiness in Tahoe2PeerSelector; also reorganize some things.
501Kevan Carstensen <kevan@isnotajoke.com>**20091118014542
502 Ignore-this: a6cb032cbff74f4f9d4238faebd99868
503] {
504hunk ./src/allmydata/immutable/upload.py 141
505         return (alreadygot, set(b.keys()))
506 
507 def servers_with_unique_shares(existing_shares, used_peers=None):
508+    """
509+    I accept a dict of shareid -> peerid mappings (and optionally a list
510+    of PeerTracker instances) and return a list of servers that have shares.
511+    """
512     servers = []
513hunk ./src/allmydata/immutable/upload.py 146
514+    existing_shares = existing_shares.copy()
515     if used_peers:
516hunk ./src/allmydata/immutable/upload.py 148
517+        peerdict = {}
518+        for peer in used_peers:
519+            peerdict.update(dict([(i, peer.peerid) for i in peer.buckets]))
520+        for k in peerdict.keys():
521+            if existing_shares.has_key(k):
522+                # Prevent overcounting; favor the bucket, and not the
523+                # prexisting share.
524+                del(existing_shares[k])
525         peers = list(used_peers.copy())
526         # We do this because the preexisting shares list goes by peerid.
527         peers = [x.peerid for x in peers]
528hunk ./src/allmydata/immutable/upload.py 164
529     return list(set(servers))
530 
531 def shares_by_server(existing_shares):
532+    """
533+    I accept a dict of shareid -> peerid mappings, and return a dict
534+    of peerid -> shareid mappings
535+    """
536     servers = {}
537     for server in set(existing_shares.values()):
538         servers[server] = set([x for x in existing_shares.keys()
539hunk ./src/allmydata/immutable/upload.py 174
540                                if existing_shares[x] == server])
541     return servers
542 
543+def should_add_server(existing_shares, server, bucket):
544+    """
545+    I tell my caller whether the servers_of_happiness number will be
546+    increased or decreased if a particular server is added as the peer
547+    already holding a particular share. I take a dictionary, a peerid,
548+    and a bucket as arguments, and return a boolean.
549+    """
550+    old_size = len(servers_with_unique_shares(existing_shares))
551+    new_candidate = existing_shares.copy()
552+    new_candidate[bucket] = server
553+    new_size = len(servers_with_unique_shares(new_candidate))
554+    return old_size < new_size
555+
556 class Tahoe2PeerSelector:
557 
558     def __init__(self, upload_id, logparent=None, upload_status=None):
559hunk ./src/allmydata/immutable/upload.py 294
560             peer = self.readonly_peers.pop()
561             assert isinstance(peer, PeerTracker)
562             d = peer.query_allocated()
563-            d.addCallback(self._handle_allocate_response)
564+            d.addCallback(self._handle_existing_response)
565             return d
566 
567hunk ./src/allmydata/immutable/upload.py 297
568-    def _handle_allocate_response(self, (peer, buckets)):
569+    def _handle_existing_response(self, (peer, buckets)):
570         for bucket in buckets:
571hunk ./src/allmydata/immutable/upload.py 299
572-            self.preexisting_shares[bucket] = peer
573-            if self.homeless_shares:
574-                self.homeless_shares.remove(bucket)
575+            if should_add_server(self.preexisting_shares, peer, bucket):
576+                self.preexisting_shares[bucket] = peer
577+                if self.homeless_shares and bucket in self.homeless_shares:
578+                    self.homeless_shares.remove(bucket)
579         return self._existing_shares()
580 
581     def _loop(self):
582hunk ./src/allmydata/immutable/upload.py 346
583                             items.append((servernum, sharelist))
584                     return self._loop()
585                 else:
586-                    raise NotEnoughSharesError("shares could only be placed on %d "
587-                                            "servers (%d were requested)" %
588-                                            (len(effective_happiness),
589-                                             self.servers_of_happiness))
590+                    raise NotEnoughSharesError("shares could only be placed "
591+                                   "on %d servers (%d were requested)" %
592+                                   (len(effective_happiness),
593+                                   self.servers_of_happiness))
594 
595         if self.uncontacted_peers:
596             peer = self.uncontacted_peers.pop(0)
597hunk ./src/allmydata/immutable/upload.py 425
598                 # we placed enough to be happy, so we're done
599                 if self._status:
600                     self._status.set_status("Placed all shares")
601-                return self.use_peers
602+                return (self.use_peers, self.preexisting_shares)
603 
604     def _got_response(self, res, peer, shares_to_ask, put_peer_here):
605         if isinstance(res, failure.Failure):
606hunk ./src/allmydata/immutable/upload.py 456
607                     level=log.NOISY, parent=self._log_parent)
608             progress = False
609             for s in alreadygot:
610-                if self.preexisting_shares.has_key(s):
611-                    old_size = len(servers_with_unique_shares(self.preexisting_shares))
612-                    new_candidate = self.preexisting_shares.copy()
613-                    new_candidate[s] = peer.peerid
614-                    new_size = len(servers_with_unique_shares(new_candidate))
615-                    if old_size >= new_size: continue
616-                self.preexisting_shares[s] = peer.peerid
617-                if s in self.homeless_shares:
618-                    self.homeless_shares.remove(s)
619-                    progress = True
620+                if should_add_server(self.preexisting_shares,
621+                                     peer.peerid, s):
622+                    self.preexisting_shares[s] = peer.peerid
623+                    if s in self.homeless_shares:
624+                        self.homeless_shares.remove(s)
625+                        progress = True
626 
627             # the PeerTracker will remember which shares were allocated on
628             # that peer. We just have to remember to use them.
629}
630[Alter the error message returned when peer selection fails
631Kevan Carstensen <kevan@isnotajoke.com>**20091123002405
632 Ignore-this: b2a7dc163edcab8d9613bfd6907e5166
633 
634 The Tahoe2PeerSelector returned either NoSharesError or NotEnoughSharesError
635 for a variety of error conditions that weren't informatively described by them.
636 This patch creates a new error, UploadHappinessError, replaces uses of
637 NoSharesError and NotEnoughSharesError with it, and alters the error message
638 raised with the errors to be more in line with the new servers_of_happiness
639 behavior. See ticket #834 for more information.
640] {
641hunk ./src/allmydata/immutable/encode.py 14
642 from allmydata.util.assertutil import _assert, precondition
643 from allmydata.codec import CRSEncoder
644 from allmydata.interfaces import IEncoder, IStorageBucketWriter, \
645-     IEncryptedUploadable, IUploadStatus, NotEnoughSharesError, NoSharesError
646+     IEncryptedUploadable, IUploadStatus, UploadHappinessError
647+
648 
649 """
650 The goal of the encoder is to turn the original file into a series of
651hunk ./src/allmydata/immutable/encode.py 498
652             msg = "lost too many servers during upload (still have %d, want %d): %s" % \
653                   (len(servers_left),
654                    self.servers_of_happiness, why)
655-            if servers_left:
656-                raise NotEnoughSharesError(msg)
657-            else:
658-                raise NoSharesError(msg)
659+            raise UploadHappinessError(msg)
660         self.log("but we can still continue with %s shares, we'll be happy "
661                  "with at least %s" % (len(servers_left),
662                                        self.servers_of_happiness),
663hunk ./src/allmydata/immutable/encode.py 508
664         d = defer.DeferredList(dl, fireOnOneErrback=True)
665         def _eatNotEnoughSharesError(f):
666             # all exceptions that occur while talking to a peer are handled
667-            # in _remove_shareholder. That might raise NotEnoughSharesError,
668+            # in _remove_shareholder. That might raise UploadHappinessError,
669             # which will cause the DeferredList to errback but which should
670hunk ./src/allmydata/immutable/encode.py 510
671-            # otherwise be consumed. Allow non-NotEnoughSharesError exceptions
672+            # otherwise be consumed. Allow non-UploadHappinessError exceptions
673             # to pass through as an unhandled errback. We use this in lieu of
674             # consumeErrors=True to allow coding errors to be logged.
675hunk ./src/allmydata/immutable/encode.py 513
676-            f.trap(NotEnoughSharesError, NoSharesError)
677+            f.trap(UploadHappinessError)
678             return None
679         for d0 in dl:
680             d0.addErrback(_eatNotEnoughSharesError)
681hunk ./src/allmydata/immutable/upload.py 20
682 from allmydata.util.rrefutil import add_version_to_remote_reference
683 from allmydata.interfaces import IUploadable, IUploader, IUploadResults, \
684      IEncryptedUploadable, RIEncryptedUploadable, IUploadStatus, \
685-     NotEnoughSharesError, NoSharesError, NoServersError, \
686-     InsufficientVersionError
687+     NoServersError, InsufficientVersionError, UploadHappinessError
688 from allmydata.immutable import layout
689 from pycryptopp.cipher.aes import AES
690 
691hunk ./src/allmydata/immutable/upload.py 119
692     def query_allocated(self):
693         d = self._storageserver.callRemote("get_buckets",
694                                            self.storage_index)
695-        d.addCallback(self._got_allocate_reply)
696         return d
697 
698hunk ./src/allmydata/immutable/upload.py 121
699-    def _got_allocate_reply(self, buckets):
700-        return (self.peerid, buckets)
701-
702     def _got_reply(self, (alreadygot, buckets)):
703         #log.msg("%s._got_reply(%s)" % (self, (alreadygot, buckets)))
704         b = {}
705hunk ./src/allmydata/immutable/upload.py 187
706     def __init__(self, upload_id, logparent=None, upload_status=None):
707         self.upload_id = upload_id
708         self.query_count, self.good_query_count, self.bad_query_count = 0,0,0
709+        # Peers that are working normally, but full.
710+        self.full_count = 0
711         self.error_count = 0
712         self.num_peers_contacted = 0
713         self.last_failure_msg = None
714hunk ./src/allmydata/immutable/upload.py 291
715             peer = self.readonly_peers.pop()
716             assert isinstance(peer, PeerTracker)
717             d = peer.query_allocated()
718-            d.addCallback(self._handle_existing_response)
719+            d.addBoth(self._handle_existing_response, peer.peerid)
720+            self.num_peers_contacted += 1
721+            self.query_count += 1
722+            log.msg("asking peer %s for any existing shares for upload id %s"
723+                    % (idlib.shortnodeid_b2a(peer.peerid), self.upload_id),
724+                    level=log.NOISY, parent=self._log_parent)
725+            if self._status:
726+                self._status.set_status("Contacting Peer %s to find "
727+                                        "any existing shares"
728+                                        % idlib.shortnodeid_b2a(peer.peerid))
729             return d
730 
731hunk ./src/allmydata/immutable/upload.py 303
732-    def _handle_existing_response(self, (peer, buckets)):
733-        for bucket in buckets:
734-            if should_add_server(self.preexisting_shares, peer, bucket):
735-                self.preexisting_shares[bucket] = peer
736-                if self.homeless_shares and bucket in self.homeless_shares:
737-                    self.homeless_shares.remove(bucket)
738+    def _handle_existing_response(self, res, peer):
739+        if isinstance(res, failure.Failure):
740+            log.msg("%s got error during existing shares check: %s"
741+                    % (idlib.shortnodeid_b2a(peer), res),
742+                    level=log.UNUSUAL, parent=self._log_parent)
743+            self.error_count += 1
744+            self.bad_query_count += 1
745+        else:
746+            buckets = res
747+            log.msg("response from peer %s: alreadygot=%s"
748+                    % (idlib.shortnodeid_b2a(peer), tuple(sorted(buckets))),
749+                    level=log.NOISY, parent=self._log_parent)
750+            for bucket in buckets:
751+                if should_add_server(self.preexisting_shares, peer, bucket):
752+                    self.preexisting_shares[bucket] = peer
753+                    if self.homeless_shares and bucket in self.homeless_shares:
754+                        self.homeless_shares.remove(bucket)
755+            self.full_count += 1
756+            self.bad_query_count += 1
757         return self._existing_shares()
758 
759     def _loop(self):
760hunk ./src/allmydata/immutable/upload.py 365
761                             items.append((servernum, sharelist))
762                     return self._loop()
763                 else:
764-                    raise NotEnoughSharesError("shares could only be placed "
765+                    raise UploadHappinessError("shares could only be placed "
766                                    "on %d servers (%d were requested)" %
767                                    (len(effective_happiness),
768                                    self.servers_of_happiness))
769hunk ./src/allmydata/immutable/upload.py 424
770                 msg = ("placed %d shares out of %d total (%d homeless), "
771                        "want to place on %d servers, "
772                        "sent %d queries to %d peers, "
773-                       "%d queries placed some shares, %d placed none, "
774-                       "got %d errors" %
775+                       "%d queries placed some shares, %d placed none "
776+                       "(of which %d placed none due to the server being"
777+                       " full and %d placed none due to an error)" %
778                        (self.total_shares - len(self.homeless_shares),
779                         self.total_shares, len(self.homeless_shares),
780                         self.servers_of_happiness,
781hunk ./src/allmydata/immutable/upload.py 432
782                         self.query_count, self.num_peers_contacted,
783                         self.good_query_count, self.bad_query_count,
784-                        self.error_count))
785+                        self.full_count, self.error_count))
786                 msg = "peer selection failed for %s: %s" % (self, msg)
787                 if self.last_failure_msg:
788                     msg += " (%s)" % (self.last_failure_msg,)
789hunk ./src/allmydata/immutable/upload.py 437
790                 log.msg(msg, level=log.UNUSUAL, parent=self._log_parent)
791-                if placed_shares:
792-                    raise NotEnoughSharesError(msg)
793-                else:
794-                    raise NoSharesError(msg)
795+                raise UploadHappinessError(msg)
796             else:
797                 # we placed enough to be happy, so we're done
798                 if self._status:
799hunk ./src/allmydata/immutable/upload.py 451
800             log.msg("%s got error during peer selection: %s" % (peer, res),
801                     level=log.UNUSUAL, parent=self._log_parent)
802             self.error_count += 1
803+            self.bad_query_count += 1
804             self.homeless_shares = list(shares_to_ask) + self.homeless_shares
805             if (self.uncontacted_peers
806                 or self.contacted_peers
807hunk ./src/allmydata/immutable/upload.py 479
808                     self.preexisting_shares[s] = peer.peerid
809                     if s in self.homeless_shares:
810                         self.homeless_shares.remove(s)
811-                        progress = True
812 
813             # the PeerTracker will remember which shares were allocated on
814             # that peer. We just have to remember to use them.
815hunk ./src/allmydata/immutable/upload.py 495
816                 self.good_query_count += 1
817             else:
818                 self.bad_query_count += 1
819+                self.full_count += 1
820 
821             if still_homeless:
822                 # In networks with lots of space, this is very unusual and
823hunk ./src/allmydata/interfaces.py 808
824         """
825 
826 class NotEnoughSharesError(Exception):
827-    """Download was unable to get enough shares, or upload was unable to
828-    place 'servers_of_happiness' shares."""
829+    """Download was unable to get enough shares"""
830 
831 class NoSharesError(Exception):
832hunk ./src/allmydata/interfaces.py 811
833-    """Upload or Download was unable to get any shares at all."""
834+    """Download was unable to get any shares at all."""
835+
836+class UploadHappinessError(Exception):
837+    """Upload was unable to satisfy 'servers_of_happiness'"""
838 
839 class UnableToFetchCriticalDownloadDataError(Exception):
840     """I was unable to fetch some piece of critical data which is supposed to
841}
842[Change "UploadHappinessError" to "UploadUnhappinessError"
843Kevan Carstensen <kevan@isnotajoke.com>**20091205043037
844 Ignore-this: 236b64ab19836854af4993bb5c1b221a
845] {
846replace ./src/allmydata/immutable/encode.py [A-Za-z_0-9] UploadHappinessError UploadUnhappinessError
847replace ./src/allmydata/immutable/upload.py [A-Za-z_0-9] UploadHappinessError UploadUnhappinessError
848replace ./src/allmydata/interfaces.py [A-Za-z_0-9] UploadHappinessError UploadUnhappinessError
849}
850[Alter the error message when an upload fails, per some comments in #778.
851Kevan Carstensen <kevan@isnotajoke.com>**20091230210344
852 Ignore-this: ba97422b2f9737c46abeb828727beb1
853 
854 When I first implemented #778, I just altered the error messages to refer to
855 servers where they referred to shares. The resulting error messages weren't
856 very good. These are a bit better.
857] {
858hunk ./src/allmydata/immutable/upload.py 200
859 
860     def get_shareholders(self, storage_broker, secret_holder,
861                          storage_index, share_size, block_size,
862-                         num_segments, total_shares, servers_of_happiness):
863+                         num_segments, total_shares, needed_shares,
864+                         servers_of_happiness):
865         """
866         @return: (used_peers, already_peers), where used_peers is a set of
867                  PeerTracker instances that have agreed to hold some shares
868hunk ./src/allmydata/immutable/upload.py 215
869 
870         self.total_shares = total_shares
871         self.servers_of_happiness = servers_of_happiness
872+        self.needed_shares = needed_shares
873 
874         self.homeless_shares = range(total_shares)
875         # self.uncontacted_peers = list() # peers we haven't asked yet
876hunk ./src/allmydata/immutable/upload.py 230
877         # existing shares for this storage index, which we want to know
878         # about for accurate servers_of_happiness accounting
879         self.readonly_peers = []
880+        # These peers have shares -- any shares -- for our SI. We keep track
881+        # of these to write an error message with them later.
882+        self.peers_with_shares = []
883 
884         peers = storage_broker.get_servers_for_index(storage_index)
885         if not peers:
886hunk ./src/allmydata/immutable/upload.py 317
887             self.bad_query_count += 1
888         else:
889             buckets = res
890+            if buckets:
891+                self.peers_with_shares.append(peer)
892             log.msg("response from peer %s: alreadygot=%s"
893                     % (idlib.shortnodeid_b2a(peer), tuple(sorted(buckets))),
894                     level=log.NOISY, parent=self._log_parent)
895hunk ./src/allmydata/immutable/upload.py 331
896             self.bad_query_count += 1
897         return self._existing_shares()
898 
899+    def _get_progress_message(self):
900+        if not self.homeless_shares:
901+            msg = "placed all %d shares, " % (self.total_shares)
902+        else:
903+            msg = ("placed %d shares out of %d total (%d homeless), " %
904+                   (self.total_shares - len(self.homeless_shares),
905+                    self.total_shares,
906+                    len(self.homeless_shares)))
907+        return (msg + "want to place shares on at least %d servers such that "
908+                      "any %d of them have enough shares to recover the file, "
909+                      "sent %d queries to %d peers, "
910+                      "%d queries placed some shares, %d placed none "
911+                      "(of which %d placed none due to the server being"
912+                      " full and %d placed none due to an error)" %
913+                        (self.servers_of_happiness, self.needed_shares,
914+                         self.query_count, self.num_peers_contacted,
915+                         self.good_query_count, self.bad_query_count,
916+                         self.full_count, self.error_count))
917+
918+
919     def _loop(self):
920         if not self.homeless_shares:
921             effective_happiness = servers_with_unique_shares(
922hunk ./src/allmydata/immutable/upload.py 357
923                                                    self.preexisting_shares,
924                                                    self.use_peers)
925             if self.servers_of_happiness <= len(effective_happiness):
926-                msg = ("placed all %d shares, "
927-                       "sent %d queries to %d peers, "
928-                       "%d queries placed some shares, %d placed none, "
929-                       "got %d errors" %
930-                       (self.total_shares,
931-                        self.query_count, self.num_peers_contacted,
932-                        self.good_query_count, self.bad_query_count,
933-                        self.error_count))
934-                log.msg("peer selection successful for %s: %s" % (self, msg),
935-                    parent=self._log_parent)
936+                msg = ("peer selection successful for %s: %s" % (self,
937+                            self._get_progress_message()))
938+                log.msg(msg, parent=self._log_parent)
939                 return (self.use_peers, self.preexisting_shares)
940             else:
941                 delta = self.servers_of_happiness - len(effective_happiness)
942hunk ./src/allmydata/immutable/upload.py 375
943                 if delta <= len(self.uncontacted_peers) and \
944                    shares_to_spread >= delta:
945                     # Loop through the allocated shares, removing
946+                    # one from each server that has more than one and putting
947+                    # it back into self.homeless_shares until we've done
948+                    # this delta times.
949                     items = shares.items()
950                     while len(self.homeless_shares) < delta:
951                         servernum, sharelist = items.pop()
952hunk ./src/allmydata/immutable/upload.py 388
953                             items.append((servernum, sharelist))
954                     return self._loop()
955                 else:
956-                    raise UploadUnhappinessError("shares could only be placed "
957-                                   "on %d servers (%d were requested)" %
958-                                   (len(effective_happiness),
959-                                   self.servers_of_happiness))
960+                    peer_count = len(list(set(self.peers_with_shares)))
961+                    # If peer_count < needed_shares, then the second error
962+                    # message is nonsensical, so we use this one.
963+                    if peer_count < self.needed_shares:
964+                        msg = ("shares could only be placed or found on %d "
965+                               "server(s). "
966+                               "We were asked to place shares on at least %d "
967+                               "server(s) such that any %d of them have "
968+                               "enough shares to recover the file." %
969+                               (peer_count,
970+                                self.servers_of_happiness,
971+                                self.needed_shares))
972+                    # Otherwise, if we've placed on at least needed_shares
973+                    # peers, but there isn't an x-happy subset of those peers
974+                    # for x < needed_shares, we use this error message.
975+                    elif len(effective_happiness) < self.needed_shares:
976+                        msg = ("shares could be placed or found on %d "
977+                               "server(s), but they are not spread out evenly "
978+                               "enough to ensure that any %d of these servers "
979+                               "would have enough shares to recover the file. "
980+                               "We were asked to place "
981+                               "shares on at least %d servers such that any "
982+                               "%d of them have enough shares to recover the "
983+                               "file." %
984+                               (peer_count,
985+                                self.needed_shares,
986+                                self.servers_of_happiness,
987+                                self.needed_shares))
988+                    # Otherwise, if there is an x-happy subset of peers where
989+                    # x >= needed_shares, but x < shares_of_happiness, then
990+                    # we use this message.
991+                    else:
992+                        msg = ("shares could only be placed on %d server(s) "
993+                               "such that any %d of them have enough shares "
994+                               "to recover the file, but we were asked to use "
995+                               "at least %d such servers." %
996+                                               (len(effective_happiness),
997+                                                self.needed_shares,
998+                                                self.servers_of_happiness))
999+                    raise UploadUnhappinessError(msg)
1000 
1001         if self.uncontacted_peers:
1002             peer = self.uncontacted_peers.pop(0)
1003hunk ./src/allmydata/immutable/upload.py 480
1004                                                    self.preexisting_shares,
1005                                                    self.use_peers)
1006             if len(effective_happiness) < self.servers_of_happiness:
1007-                msg = ("placed %d shares out of %d total (%d homeless), "
1008-                       "want to place on %d servers, "
1009-                       "sent %d queries to %d peers, "
1010-                       "%d queries placed some shares, %d placed none "
1011-                       "(of which %d placed none due to the server being"
1012-                       " full and %d placed none due to an error)" %
1013-                       (self.total_shares - len(self.homeless_shares),
1014-                        self.total_shares, len(self.homeless_shares),
1015-                        self.servers_of_happiness,
1016-                        self.query_count, self.num_peers_contacted,
1017-                        self.good_query_count, self.bad_query_count,
1018-                        self.full_count, self.error_count))
1019-                msg = "peer selection failed for %s: %s" % (self, msg)
1020+                msg = ("peer selection failed for %s: %s" % (self,
1021+                                self._get_progress_message()))
1022                 if self.last_failure_msg:
1023                     msg += " (%s)" % (self.last_failure_msg,)
1024                 log.msg(msg, level=log.UNUSUAL, parent=self._log_parent)
1025hunk ./src/allmydata/immutable/upload.py 534
1026                 self.use_peers.add(peer)
1027                 progress = True
1028 
1029+            if allocated or alreadygot:
1030+                self.peers_with_shares.append(peer.peerid)
1031+
1032             not_yet_present = set(shares_to_ask) - set(alreadygot)
1033             still_homeless = not_yet_present - set(allocated)
1034 
1035hunk ./src/allmydata/immutable/upload.py 931
1036         d = peer_selector.get_shareholders(storage_broker, secret_holder,
1037                                            storage_index,
1038                                            share_size, block_size,
1039-                                           num_segments, n, desired)
1040+                                           num_segments, n, k, desired)
1041         def _done(res):
1042             self._peer_selection_elapsed = time.time() - peer_selection_started
1043             return res
1044}
1045[Alter wording in 'interfaces.py' to be correct wrt #778
1046"Kevan Carstensen" <kevan@isnotajoke.com>**20091205034005
1047 Ignore-this: c9913c700ac14e7a63569458b06980e0
1048] hunk ./src/allmydata/interfaces.py 1277
1049     def set_params(params):
1050         """Override the default encoding parameters. 'params' is a tuple of
1051         (k,d,n), where 'k' is the number of required shares, 'd' is the
1052-        shares_of_happiness, and 'n' is the total number of shares that will
1053+        servers_of_happiness, and 'n' is the total number of shares that will
1054         be created.
1055 
1056         Encoding parameters can be set in three ways. 1: The Encoder class
1057[Fix up the behavior of #778, per reviewers' comments
1058Kevan Carstensen <kevan@isnotajoke.com>**20100507221147
1059 Ignore-this: a55aa984472284e4fd8bdbdf706c918a
1060 
1061   - Make some important utility functions clearer and more thoroughly
1062     documented.
1063   - Assert in upload.servers_of_happiness that the buckets attributes
1064     of PeerTrackers passed to it are mutually disjoint.
1065   - Get rid of some silly non-Pythonisms that I didn't see when I first
1066     wrote these patches.
1067   - Make sure that should_add_server returns true when queried about a
1068     shnum that it doesn't know about yet.
1069   - Change Tahoe2PeerSelector.preexisting_shares to map a shareid to a set
1070     of peerids, alter dependencies to deal with that.
1071   - Remove upload.should_add_servers, because it is no longer necessary
1072   - Move upload.shares_of_happiness and upload.shares_by_server to a utility
1073     file.
1074   - Change some points in Tahoe2PeerSelector.
1075   - Compute servers_of_happiness using a bipartite matching algorithm that
1076     we know is optimal instead of an ad-hoc greedy algorithm that isn't.
1077   - Change servers_of_happiness to just take a sharemap as an argument,
1078     change its callers to merge existing_shares and used_peers before
1079     calling it.
1080   - Change an error message in the encoder to be more appropriate for
1081     servers of happiness.
1082   - Clarify the wording of an error message in immutable/upload.py
1083   - Refactor a happiness failure message to happinessutil.py, and make
1084     immutable/upload.py and immutable/encode.py use it.
1085 
1086] {
1087hunk ./src/allmydata/immutable/encode.py 10
1088 from allmydata import uri
1089 from allmydata.storage.server import si_b2a
1090 from allmydata.hashtree import HashTree
1091-from allmydata.util import mathutil, hashutil, base32, log
1092+from allmydata.util import mathutil, hashutil, base32, log, happinessutil
1093 from allmydata.util.assertutil import _assert, precondition
1094 from allmydata.codec import CRSEncoder
1095 from allmydata.interfaces import IEncoder, IStorageBucketWriter, \
1096hunk ./src/allmydata/immutable/encode.py 201
1097             assert IStorageBucketWriter.providedBy(landlords[k])
1098         self.landlords = landlords.copy()
1099         assert isinstance(servermap, dict)
1100+        for v in servermap.itervalues():
1101+            assert isinstance(v, set)
1102         self.servermap = servermap.copy()
1103 
1104     def start(self):
1105hunk ./src/allmydata/immutable/encode.py 489
1106                       level=log.UNUSUAL, failure=why)
1107         if shareid in self.landlords:
1108             self.landlords[shareid].abort()
1109+            peerid = self.landlords[shareid].get_peerid()
1110+            assert peerid
1111             del self.landlords[shareid]
1112hunk ./src/allmydata/immutable/encode.py 492
1113+            self.servermap[shareid].remove(peerid)
1114+            if not self.servermap[shareid]:
1115+                del self.servermap[shareid]
1116         else:
1117             # even more UNUSUAL
1118             self.log("they weren't in our list of landlords", parent=ln,
1119hunk ./src/allmydata/immutable/encode.py 499
1120                      level=log.WEIRD, umid="TQGFRw")
1121-        del(self.servermap[shareid])
1122-        servers_left = list(set(self.servermap.values()))
1123-        if len(servers_left) < self.servers_of_happiness:
1124-            msg = "lost too many servers during upload (still have %d, want %d): %s" % \
1125-                  (len(servers_left),
1126-                   self.servers_of_happiness, why)
1127+        happiness = happinessutil.servers_of_happiness(self.servermap)
1128+        if happiness < self.servers_of_happiness:
1129+            peerids = set(happinessutil.shares_by_server(self.servermap).keys())
1130+            msg = happinessutil.failure_message(len(peerids),
1131+                                                self.required_shares,
1132+                                                self.servers_of_happiness,
1133+                                                happiness)
1134+            msg = "%s: %s" % (msg, why)
1135             raise UploadUnhappinessError(msg)
1136         self.log("but we can still continue with %s shares, we'll be happy "
1137hunk ./src/allmydata/immutable/encode.py 509
1138-                 "with at least %s" % (len(servers_left),
1139+                 "with at least %s" % (happiness,
1140                                        self.servers_of_happiness),
1141                  parent=ln)
1142 
1143hunk ./src/allmydata/immutable/encode.py 515
1144     def _gather_responses(self, dl):
1145         d = defer.DeferredList(dl, fireOnOneErrback=True)
1146-        def _eatNotEnoughSharesError(f):
1147+        def _eatUploadUnhappinessError(f):
1148             # all exceptions that occur while talking to a peer are handled
1149             # in _remove_shareholder. That might raise UploadUnhappinessError,
1150             # which will cause the DeferredList to errback but which should
1151hunk ./src/allmydata/immutable/encode.py 525
1152             f.trap(UploadUnhappinessError)
1153             return None
1154         for d0 in dl:
1155-            d0.addErrback(_eatNotEnoughSharesError)
1156+            d0.addErrback(_eatUploadUnhappinessError)
1157         return d
1158 
1159     def finish_hashing(self):
1160hunk ./src/allmydata/immutable/layout.py 245
1161     def abort(self):
1162         return self._rref.callRemoteOnly("abort")
1163 
1164+
1165+    def get_peerid(self):
1166+        if self._nodeid:
1167+            return self._nodeid
1168+        return None
1169+
1170 class WriteBucketProxy_v2(WriteBucketProxy):
1171     fieldsize = 8
1172     fieldstruct = ">Q"
1173hunk ./src/allmydata/immutable/upload.py 16
1174 from allmydata.storage.server import si_b2a
1175 from allmydata.immutable import encode
1176 from allmydata.util import base32, dictutil, idlib, log, mathutil
1177+from allmydata.util.happinessutil import servers_of_happiness, \
1178+                                         shares_by_server, merge_peers, \
1179+                                         failure_message
1180 from allmydata.util.assertutil import precondition
1181 from allmydata.util.rrefutil import add_version_to_remote_reference
1182 from allmydata.interfaces import IUploadable, IUploader, IUploadResults, \
1183hunk ./src/allmydata/immutable/upload.py 120
1184         return d
1185 
1186     def query_allocated(self):
1187-        d = self._storageserver.callRemote("get_buckets",
1188-                                           self.storage_index)
1189-        return d
1190+        return self._storageserver.callRemote("get_buckets",
1191+                                              self.storage_index)
1192 
1193     def _got_reply(self, (alreadygot, buckets)):
1194         #log.msg("%s._got_reply(%s)" % (self, (alreadygot, buckets)))
1195hunk ./src/allmydata/immutable/upload.py 137
1196         self.buckets.update(b)
1197         return (alreadygot, set(b.keys()))
1198 
1199-def servers_with_unique_shares(existing_shares, used_peers=None):
1200-    """
1201-    I accept a dict of shareid -> peerid mappings (and optionally a list
1202-    of PeerTracker instances) and return a list of servers that have shares.
1203-    """
1204-    servers = []
1205-    existing_shares = existing_shares.copy()
1206-    if used_peers:
1207-        peerdict = {}
1208-        for peer in used_peers:
1209-            peerdict.update(dict([(i, peer.peerid) for i in peer.buckets]))
1210-        for k in peerdict.keys():
1211-            if existing_shares.has_key(k):
1212-                # Prevent overcounting; favor the bucket, and not the
1213-                # prexisting share.
1214-                del(existing_shares[k])
1215-        peers = list(used_peers.copy())
1216-        # We do this because the preexisting shares list goes by peerid.
1217-        peers = [x.peerid for x in peers]
1218-        servers.extend(peers)
1219-    servers.extend(existing_shares.values())
1220-    return list(set(servers))
1221-
1222-def shares_by_server(existing_shares):
1223-    """
1224-    I accept a dict of shareid -> peerid mappings, and return a dict
1225-    of peerid -> shareid mappings
1226-    """
1227-    servers = {}
1228-    for server in set(existing_shares.values()):
1229-        servers[server] = set([x for x in existing_shares.keys()
1230-                               if existing_shares[x] == server])
1231-    return servers
1232-
1233-def should_add_server(existing_shares, server, bucket):
1234-    """
1235-    I tell my caller whether the servers_of_happiness number will be
1236-    increased or decreased if a particular server is added as the peer
1237-    already holding a particular share. I take a dictionary, a peerid,
1238-    and a bucket as arguments, and return a boolean.
1239-    """
1240-    old_size = len(servers_with_unique_shares(existing_shares))
1241-    new_candidate = existing_shares.copy()
1242-    new_candidate[bucket] = server
1243-    new_size = len(servers_with_unique_shares(new_candidate))
1244-    return old_size < new_size
1245 
1246 class Tahoe2PeerSelector:
1247 
1248hunk ./src/allmydata/immutable/upload.py 162
1249         @return: (used_peers, already_peers), where used_peers is a set of
1250                  PeerTracker instances that have agreed to hold some shares
1251                  for us (the shnum is stashed inside the PeerTracker),
1252-                 and already_peers is a dict mapping shnum to a peer
1253-                 which claims to already have the share.
1254+                 and already_peers is a dict mapping shnum to a set of peers
1255+                 which claim to already have the share.
1256         """
1257 
1258         if self._status:
1259hunk ./src/allmydata/immutable/upload.py 174
1260         self.needed_shares = needed_shares
1261 
1262         self.homeless_shares = range(total_shares)
1263-        # self.uncontacted_peers = list() # peers we haven't asked yet
1264         self.contacted_peers = [] # peers worth asking again
1265         self.contacted_peers2 = [] # peers that we have asked again
1266         self._started_second_pass = False
1267hunk ./src/allmydata/immutable/upload.py 178
1268         self.use_peers = set() # PeerTrackers that have shares assigned to them
1269-        self.preexisting_shares = {} # sharenum -> peerid holding the share
1270-        # We don't try to allocate shares to these servers, since they've
1271-        # said that they're incapable of storing shares of the size that
1272-        # we'd want to store. We keep them around because they may have
1273-        # existing shares for this storage index, which we want to know
1274-        # about for accurate servers_of_happiness accounting
1275-        self.readonly_peers = []
1276-        # These peers have shares -- any shares -- for our SI. We keep track
1277-        # of these to write an error message with them later.
1278-        self.peers_with_shares = []
1279+        self.preexisting_shares = {} # shareid => set(peerids) holding shareid
1280+        # We don't try to allocate shares to these servers, since they've said
1281+        # that they're incapable of storing shares of the size that we'd want
1282+        # to store. We keep them around because they may have existing shares
1283+        # for this storage index, which we want to know about for accurate
1284+        # servers_of_happiness accounting
1285+        # (this is eventually a list, but it is initialized later)
1286+        self.readonly_peers = None
1287+        # These peers have shares -- any shares -- for our SI. We keep
1288+        # track of these to write an error message with them later.
1289+        self.peers_with_shares = set([])
1290 
1291hunk ./src/allmydata/immutable/upload.py 190
1292-        peers = storage_broker.get_servers_for_index(storage_index)
1293-        if not peers:
1294-            raise NoServersError("client gave us zero peers")
1295-
1296         # this needed_hashes computation should mirror
1297         # Encoder.send_all_share_hash_trees. We use an IncompleteHashTree
1298         # (instead of a HashTree) because we don't require actual hashing
1299hunk ./src/allmydata/immutable/upload.py 202
1300                                              num_share_hashes, EXTENSION_SIZE,
1301                                              None)
1302         allocated_size = wbp.get_allocated_size()
1303+        all_peers = storage_broker.get_servers_for_index(storage_index)
1304+        if not all_peers:
1305+            raise NoServersError("client gave us zero peers")
1306 
1307         # filter the list of peers according to which ones can accomodate
1308         # this request. This excludes older peers (which used a 4-byte size
1309hunk ./src/allmydata/immutable/upload.py 214
1310             (peerid, conn) = peer
1311             v1 = conn.version["http://allmydata.org/tahoe/protocols/storage/v1"]
1312             return v1["maximum-immutable-share-size"]
1313-        new_peers = [peer for peer in peers
1314-                     if _get_maxsize(peer) >= allocated_size]
1315-        old_peers = list(set(peers).difference(set(new_peers)))
1316-        peers = new_peers
1317+        writable_peers = [peer for peer in all_peers
1318+                          if _get_maxsize(peer) >= allocated_size]
1319+        readonly_peers = set(all_peers) - set(writable_peers)
1320 
1321         # decide upon the renewal/cancel secrets, to include them in the
1322         # allocate_buckets query.
1323hunk ./src/allmydata/immutable/upload.py 228
1324         file_cancel_secret = file_cancel_secret_hash(client_cancel_secret,
1325                                                      storage_index)
1326         def _make_trackers(peers):
1327-           return [ PeerTracker(peerid, conn,
1328-                                share_size, block_size,
1329-                                num_segments, num_share_hashes,
1330-                                storage_index,
1331-                                bucket_renewal_secret_hash(file_renewal_secret,
1332-                                                           peerid),
1333-                                bucket_cancel_secret_hash(file_cancel_secret,
1334-                                                          peerid))
1335+           return [PeerTracker(peerid, conn,
1336+                               share_size, block_size,
1337+                               num_segments, num_share_hashes,
1338+                               storage_index,
1339+                               bucket_renewal_secret_hash(file_renewal_secret,
1340+                                                          peerid),
1341+                               bucket_cancel_secret_hash(file_cancel_secret,
1342+                                                         peerid))
1343                     for (peerid, conn) in peers]
1344hunk ./src/allmydata/immutable/upload.py 237
1345-        self.uncontacted_peers = _make_trackers(peers)
1346-        self.readonly_peers = _make_trackers(old_peers)
1347-        # Talk to the readonly servers to get an idea of what servers
1348-        # have what shares (if any) for this storage index
1349+        self.uncontacted_peers = _make_trackers(writable_peers)
1350+        self.readonly_peers = _make_trackers(readonly_peers)
1351+        # We now ask peers that can't hold any new shares about existing
1352+        # shares that they might have for our SI. Once this is done, we
1353+        # start placing the shares that we haven't already accounted
1354+        # for.
1355         d = defer.maybeDeferred(self._existing_shares)
1356         d.addCallback(lambda ign: self._loop())
1357         return d
1358hunk ./src/allmydata/immutable/upload.py 248
1359 
1360     def _existing_shares(self):
1361+        """
1362+        I loop through the list of peers that aren't accepting any new
1363+        shares for this upload, asking each of them to tell me about the
1364+        shares they already have for this upload's SI.
1365+        """
1366         if self.readonly_peers:
1367             peer = self.readonly_peers.pop()
1368             assert isinstance(peer, PeerTracker)
1369hunk ./src/allmydata/immutable/upload.py 270
1370             return d
1371 
1372     def _handle_existing_response(self, res, peer):
1373+        """
1374+        I handle responses to the queries sent by
1375+        Tahoe2PeerSelector._existing_shares.
1376+        """
1377         if isinstance(res, failure.Failure):
1378             log.msg("%s got error during existing shares check: %s"
1379                     % (idlib.shortnodeid_b2a(peer), res),
1380hunk ./src/allmydata/immutable/upload.py 283
1381         else:
1382             buckets = res
1383             if buckets:
1384-                self.peers_with_shares.append(peer)
1385+                self.peers_with_shares.add(peer)
1386             log.msg("response from peer %s: alreadygot=%s"
1387                     % (idlib.shortnodeid_b2a(peer), tuple(sorted(buckets))),
1388                     level=log.NOISY, parent=self._log_parent)
1389hunk ./src/allmydata/immutable/upload.py 288
1390             for bucket in buckets:
1391-                if should_add_server(self.preexisting_shares, peer, bucket):
1392-                    self.preexisting_shares[bucket] = peer
1393-                    if self.homeless_shares and bucket in self.homeless_shares:
1394-                        self.homeless_shares.remove(bucket)
1395+                self.preexisting_shares.setdefault(bucket, set()).add(peer)
1396+                if self.homeless_shares and bucket in self.homeless_shares:
1397+                    self.homeless_shares.remove(bucket)
1398             self.full_count += 1
1399             self.bad_query_count += 1
1400         return self._existing_shares()
1401hunk ./src/allmydata/immutable/upload.py 317
1402 
1403     def _loop(self):
1404         if not self.homeless_shares:
1405-            effective_happiness = servers_with_unique_shares(
1406-                                                   self.preexisting_shares,
1407-                                                   self.use_peers)
1408-            if self.servers_of_happiness <= len(effective_happiness):
1409+            merged = merge_peers(self.preexisting_shares, self.use_peers)
1410+            effective_happiness = servers_of_happiness(merged)
1411+            if self.servers_of_happiness <= effective_happiness:
1412                 msg = ("peer selection successful for %s: %s" % (self,
1413                             self._get_progress_message()))
1414                 log.msg(msg, parent=self._log_parent)
1415hunk ./src/allmydata/immutable/upload.py 325
1416                 return (self.use_peers, self.preexisting_shares)
1417             else:
1418-                delta = self.servers_of_happiness - len(effective_happiness)
1419+                # We're not okay right now, but maybe we can fix it by
1420+                # redistributing some shares. In cases where one or two
1421+                # servers has, before the upload, all or most of the
1422+                # shares for a given SI, this can work by allowing _loop
1423+                # a chance to spread those out over the other peers,
1424+                delta = self.servers_of_happiness - effective_happiness
1425                 shares = shares_by_server(self.preexisting_shares)
1426                 # Each server in shares maps to a set of shares stored on it.
1427                 # Since we want to keep at least one share on each server
1428hunk ./src/allmydata/immutable/upload.py 342
1429                                         in shares.items()])
1430                 if delta <= len(self.uncontacted_peers) and \
1431                    shares_to_spread >= delta:
1432-                    # Loop through the allocated shares, removing
1433-                    # one from each server that has more than one and putting
1434-                    # it back into self.homeless_shares until we've done
1435-                    # this delta times.
1436                     items = shares.items()
1437                     while len(self.homeless_shares) < delta:
1438hunk ./src/allmydata/immutable/upload.py 344
1439-                        servernum, sharelist = items.pop()
1440+                        # Loop through the allocated shares, removing
1441+                        # one from each server that has more than one
1442+                        # and putting it back into self.homeless_shares
1443+                        # until we've done this delta times.
1444+                        server, sharelist = items.pop()
1445                         if len(sharelist) > 1:
1446                             share = sharelist.pop()
1447                             self.homeless_shares.append(share)
1448hunk ./src/allmydata/immutable/upload.py 352
1449-                            del(self.preexisting_shares[share])
1450-                            items.append((servernum, sharelist))
1451+                            self.preexisting_shares[share].remove(server)
1452+                            if not self.preexisting_shares[share]:
1453+                                del self.preexisting_shares[share]
1454+                            items.append((server, sharelist))
1455                     return self._loop()
1456                 else:
1457hunk ./src/allmydata/immutable/upload.py 358
1458-                    peer_count = len(list(set(self.peers_with_shares)))
1459+                    # Redistribution won't help us; fail.
1460+                    peer_count = len(self.peers_with_shares)
1461                     # If peer_count < needed_shares, then the second error
1462                     # message is nonsensical, so we use this one.
1463hunk ./src/allmydata/immutable/upload.py 362
1464-                    if peer_count < self.needed_shares:
1465-                        msg = ("shares could only be placed or found on %d "
1466-                               "server(s). "
1467-                               "We were asked to place shares on at least %d "
1468-                               "server(s) such that any %d of them have "
1469-                               "enough shares to recover the file." %
1470-                               (peer_count,
1471-                                self.servers_of_happiness,
1472-                                self.needed_shares))
1473-                    # Otherwise, if we've placed on at least needed_shares
1474-                    # peers, but there isn't an x-happy subset of those peers
1475-                    # for x < needed_shares, we use this error message.
1476-                    elif len(effective_happiness) < self.needed_shares:
1477-                        msg = ("shares could be placed or found on %d "
1478-                               "server(s), but they are not spread out evenly "
1479-                               "enough to ensure that any %d of these servers "
1480-                               "would have enough shares to recover the file. "
1481-                               "We were asked to place "
1482-                               "shares on at least %d servers such that any "
1483-                               "%d of them have enough shares to recover the "
1484-                               "file." %
1485-                               (peer_count,
1486-                                self.needed_shares,
1487-                                self.servers_of_happiness,
1488-                                self.needed_shares))
1489-                    # Otherwise, if there is an x-happy subset of peers where
1490-                    # x >= needed_shares, but x < shares_of_happiness, then
1491-                    # we use this message.
1492-                    else:
1493-                        msg = ("shares could only be placed on %d server(s) "
1494-                               "such that any %d of them have enough shares "
1495-                               "to recover the file, but we were asked to use "
1496-                               "at least %d such servers." %
1497-                                               (len(effective_happiness),
1498-                                                self.needed_shares,
1499-                                                self.servers_of_happiness))
1500+                    msg = failure_message(peer_count,
1501+                                          self.needed_shares,
1502+                                          self.servers_of_happiness,
1503+                                          effective_happiness)
1504                     raise UploadUnhappinessError(msg)
1505 
1506         if self.uncontacted_peers:
1507hunk ./src/allmydata/immutable/upload.py 415
1508         else:
1509             # no more peers. If we haven't placed enough shares, we fail.
1510             placed_shares = self.total_shares - len(self.homeless_shares)
1511-            effective_happiness = servers_with_unique_shares(
1512-                                                   self.preexisting_shares,
1513-                                                   self.use_peers)
1514-            if len(effective_happiness) < self.servers_of_happiness:
1515+            merged = merge_peers(self.preexisting_shares, self.use_peers)
1516+            effective_happiness = servers_of_happiness(merged)
1517+            if effective_happiness < self.servers_of_happiness:
1518                 msg = ("peer selection failed for %s: %s" % (self,
1519                                 self._get_progress_message()))
1520                 if self.last_failure_msg:
1521hunk ./src/allmydata/immutable/upload.py 460
1522                     level=log.NOISY, parent=self._log_parent)
1523             progress = False
1524             for s in alreadygot:
1525-                if should_add_server(self.preexisting_shares,
1526-                                     peer.peerid, s):
1527-                    self.preexisting_shares[s] = peer.peerid
1528-                    if s in self.homeless_shares:
1529-                        self.homeless_shares.remove(s)
1530+                self.preexisting_shares.setdefault(s, set()).add(peer.peerid)
1531+                if s in self.homeless_shares:
1532+                    self.homeless_shares.remove(s)
1533 
1534             # the PeerTracker will remember which shares were allocated on
1535             # that peer. We just have to remember to use them.
1536hunk ./src/allmydata/immutable/upload.py 471
1537                 progress = True
1538 
1539             if allocated or alreadygot:
1540-                self.peers_with_shares.append(peer.peerid)
1541+                self.peers_with_shares.add(peer.peerid)
1542 
1543             not_yet_present = set(shares_to_ask) - set(alreadygot)
1544             still_homeless = not_yet_present - set(allocated)
1545hunk ./src/allmydata/immutable/upload.py 877
1546     def set_shareholders(self, (used_peers, already_peers), encoder):
1547         """
1548         @param used_peers: a sequence of PeerTracker objects
1549-        @paran already_peers: a dict mapping sharenum to a peerid that
1550-                              claims to already have this share
1551+        @paran already_peers: a dict mapping sharenum to a set of peerids
1552+                              that claim to already have this share
1553         """
1554         self.log("_send_shares, used_peers is %s" % (used_peers,))
1555         # record already-present shares in self._results
1556hunk ./src/allmydata/immutable/upload.py 893
1557             buckets.update(peer.buckets)
1558             for shnum in peer.buckets:
1559                 self._peer_trackers[shnum] = peer
1560-                servermap[shnum] = peer.peerid
1561+                servermap.setdefault(shnum, set()).add(peer.peerid)
1562         assert len(buckets) == sum([len(peer.buckets) for peer in used_peers])
1563         encoder.set_shareholders(buckets, servermap)
1564 
1565hunk ./src/allmydata/interfaces.py 1348
1566         must be a dictionary that maps share number (an integer ranging from
1567         0 to n-1) to an instance that provides IStorageBucketWriter.
1568         'servermap' is a dictionary that maps share number (as defined above)
1569-        to a peerid. This must be performed before start() can be called."""
1570+        to a set of peerids. This must be performed before start() can be
1571+        called."""
1572 
1573     def start():
1574         """Begin the encode/upload process. This involves reading encrypted
1575addfile ./src/allmydata/util/happinessutil.py
1576hunk ./src/allmydata/util/happinessutil.py 1
1577+"""
1578+I contain utilities useful for calculating servers_of_happiness, and for
1579+reporting it in messages
1580+"""
1581+
1582+def failure_message(peer_count, k, happy, effective_happy):
1583+    # If peer_count < needed_shares, this error message makes more
1584+    # sense than any of the others, so use it.
1585+    if peer_count < k:
1586+        msg = ("shares could only be placed or found on %d "
1587+               "server(s). "
1588+               "We were asked to place shares on at least %d "
1589+               "server(s) such that any %d of them have "
1590+               "enough shares to recover the file." %
1591+                (peer_count, happy, k))
1592+    # Otherwise, if we've placed on at least needed_shares
1593+    # peers, but there isn't an x-happy subset of those peers
1594+    # for x >= needed_shares, we use this error message.
1595+    elif effective_happy < k:
1596+        msg = ("shares could be placed or found on %d "
1597+               "server(s), but they are not spread out evenly "
1598+               "enough to ensure that any %d of these servers "
1599+               "would have enough shares to recover the file. "
1600+               "We were asked to place "
1601+               "shares on at least %d servers such that any "
1602+               "%d of them have enough shares to recover the "
1603+               "file." %
1604+                (peer_count, k, happy, k))
1605+    # Otherwise, if there is an x-happy subset of peers where
1606+    # x >= needed_shares, but x < servers_of_happiness, then
1607+    # we use this message.
1608+    else:
1609+        msg = ("shares could only be placed on %d server(s) "
1610+               "such that any %d of them have enough shares "
1611+               "to recover the file, but we were asked to "
1612+               "place shares on at least %d such servers." %
1613+                (effective_happy, k, happy))
1614+    return msg
1615+
1616+
1617+def shares_by_server(servermap):
1618+    """
1619+    I accept a dict of shareid -> set(peerid) mappings, and return a
1620+    dict of peerid -> set(shareid) mappings. My argument is a dictionary
1621+    with sets of peers, indexed by shares, and I transform that into a
1622+    dictionary of sets of shares, indexed by peerids.
1623+    """
1624+    ret = {}
1625+    for shareid, peers in servermap.iteritems():
1626+        assert isinstance(peers, set)
1627+        for peerid in peers:
1628+            ret.setdefault(peerid, set()).add(shareid)
1629+    return ret
1630+
1631+def merge_peers(servermap, used_peers=None):
1632+    """
1633+    I accept a dict of shareid -> set(peerid) mappings, and optionally a
1634+    set of PeerTrackers. If no set of PeerTrackers is provided, I return
1635+    my first argument unmodified. Otherwise, I update a copy of my first
1636+    argument to include the shareid -> peerid mappings implied in the
1637+    set of PeerTrackers, returning the resulting dict.
1638+    """
1639+    if not used_peers:
1640+        return servermap
1641+
1642+    assert(isinstance(servermap, dict))
1643+    assert(isinstance(used_peers, set))
1644+
1645+    # Since we mutate servermap, and are called outside of a
1646+    # context where it is okay to do that, make a copy of servermap and
1647+    # work with it.
1648+    servermap = servermap.copy()
1649+    for peer in used_peers:
1650+        for shnum in peer.buckets:
1651+            servermap.setdefault(shnum, set()).add(peer.peerid)
1652+    return servermap
1653+
1654+def servers_of_happiness(sharemap):
1655+    """
1656+    I accept 'sharemap', a dict of shareid -> set(peerid) mappings. I
1657+    return the 'servers_of_happiness' number that sharemap results in.
1658+
1659+    To calculate the 'servers_of_happiness' number for the sharemap, I
1660+    construct a bipartite graph with servers in one partition of vertices
1661+    and shares in the other, and with an edge between a server s and a share t
1662+    if s is to store t. I then compute the size of a maximum matching in
1663+    the resulting graph; this is then returned as the 'servers_of_happiness'
1664+    for my arguments.
1665+
1666+    For example, consider the following layout:
1667+
1668+      server 1: shares 1, 2, 3, 4
1669+      server 2: share 6
1670+      server 3: share 3
1671+      server 4: share 4
1672+      server 5: share 2
1673+
1674+    From this, we can construct the following graph:
1675+
1676+      L = {server 1, server 2, server 3, server 4, server 5}
1677+      R = {share 1, share 2, share 3, share 4, share 6}
1678+      V = L U R
1679+      E = {(server 1, share 1), (server 1, share 2), (server 1, share 3),
1680+           (server 1, share 4), (server 2, share 6), (server 3, share 3),
1681+           (server 4, share 4), (server 5, share 2)}
1682+      G = (V, E)
1683+
1684+    Note that G is bipartite since every edge in e has one endpoint in L
1685+    and one endpoint in R.
1686+
1687+    A matching in a graph G is a subset M of E such that, for any vertex
1688+    v in V, v is incident to at most one edge of M. A maximum matching
1689+    in G is a matching that is no smaller than any other matching. For
1690+    this graph, a matching of cardinality 5 is:
1691+
1692+      M = {(server 1, share 1), (server 2, share 6),
1693+           (server 3, share 3), (server 4, share 4),
1694+           (server 5, share 2)}
1695+
1696+    Since G is bipartite, and since |L| = 5, we cannot have an M' such
1697+    that |M'| > |M|. Then M is a maximum matching in G. Intuitively, and
1698+    as long as k <= 5, we can see that the layout above has
1699+    servers_of_happiness = 5, which matches the results here.
1700+    """
1701+    if sharemap == {}:
1702+        return 0
1703+    sharemap = shares_by_server(sharemap)
1704+    graph = flow_network_for(sharemap)
1705+    # This is an implementation of the Ford-Fulkerson method for finding
1706+    # a maximum flow in a flow network applied to a bipartite graph.
1707+    # Specifically, it is the Edmonds-Karp algorithm, since it uses a
1708+    # BFS to find the shortest augmenting path at each iteration, if one
1709+    # exists.
1710+    #
1711+    # The implementation here is an adapation of an algorithm described in
1712+    # "Introduction to Algorithms", Cormen et al, 2nd ed., pp 658-662.
1713+    dim = len(graph)
1714+    flow_function = [[0 for sh in xrange(dim)] for s in xrange(dim)]
1715+    residual_graph, residual_function = residual_network(graph, flow_function)
1716+    while augmenting_path_for(residual_graph):
1717+        path = augmenting_path_for(residual_graph)
1718+        # Delta is the largest amount that we can increase flow across
1719+        # all of the edges in path. Because of the way that the residual
1720+        # function is constructed, f[u][v] for a particular edge (u, v)
1721+        # is the amount of unused capacity on that edge. Taking the
1722+        # minimum of a list of those values for each edge in the
1723+        # augmenting path gives us our delta.
1724+        delta = min(map(lambda (u, v): residual_function[u][v], path))
1725+        for (u, v) in path:
1726+            flow_function[u][v] += delta
1727+            flow_function[v][u] -= delta
1728+        residual_graph, residual_function = residual_network(graph,
1729+                                                             flow_function)
1730+    num_servers = len(sharemap)
1731+    # The value of a flow is the total flow out of the source vertex
1732+    # (vertex 0, in our graph). We could just as well sum across all of
1733+    # f[0], but we know that vertex 0 only has edges to the servers in
1734+    # our graph, so we can stop after summing flow across those. The
1735+    # value of a flow computed in this way is the size of a maximum
1736+    # matching on the bipartite graph described above.
1737+    return sum([flow_function[0][v] for v in xrange(1, num_servers+1)])
1738+
1739+def flow_network_for(sharemap):
1740+    """
1741+    I take my argument, a dict of peerid -> set(shareid) mappings, and
1742+    turn it into a flow network suitable for use with Edmonds-Karp. I
1743+    then return the adjacency list representation of that network.
1744+
1745+    Specifically, I build G = (V, E), where:
1746+      V = { peerid in sharemap } U { shareid in sharemap } U {s, t}
1747+      E = {(s, peerid) for each peerid}
1748+          U {(peerid, shareid) if peerid is to store shareid }
1749+          U {(shareid, t) for each shareid}
1750+
1751+    s and t will be source and sink nodes when my caller starts treating
1752+    the graph I return like a flow network. Without s and t, the
1753+    returned graph is bipartite.
1754+    """
1755+    # Servers don't have integral identifiers, and we can't make any
1756+    # assumptions about the way shares are indexed -- it's possible that
1757+    # there are missing shares, for example. So before making a graph,
1758+    # we re-index so that all of our vertices have integral indices, and
1759+    # that there aren't any holes. We start indexing at 1, so that we
1760+    # can add a source node at index 0.
1761+    sharemap, num_shares = reindex(sharemap, base_index=1)
1762+    num_servers = len(sharemap)
1763+    graph = [] # index -> [index], an adjacency list
1764+    # Add an entry at the top (index 0) that has an edge to every server
1765+    # in sharemap
1766+    graph.append(sharemap.keys())
1767+    # For each server, add an entry that has an edge to every share that it
1768+    # contains (or will contain).
1769+    for k in sharemap:
1770+        graph.append(sharemap[k])
1771+    # For each share, add an entry that has an edge to the sink.
1772+    sink_num = num_servers + num_shares + 1
1773+    for i in xrange(num_shares):
1774+        graph.append([sink_num])
1775+    # Add an empty entry for the sink, which has no outbound edges.
1776+    graph.append([])
1777+    return graph
1778+
1779+def reindex(sharemap, base_index):
1780+    """
1781+    Given sharemap, I map peerids and shareids to integers that don't
1782+    conflict with each other, so they're useful as indices in a graph. I
1783+    return a sharemap that is reindexed appropriately, and also the
1784+    number of distinct shares in the resulting sharemap as a convenience
1785+    for my caller. base_index tells me where to start indexing.
1786+    """
1787+    shares  = {} # shareid  -> vertex index
1788+    num = base_index
1789+    ret = {} # peerid -> [shareid], a reindexed sharemap.
1790+    # Number the servers first
1791+    for k in sharemap:
1792+        ret[num] = sharemap[k]
1793+        num += 1
1794+    # Number the shares
1795+    for k in ret:
1796+        for shnum in ret[k]:
1797+            if not shares.has_key(shnum):
1798+                shares[shnum] = num
1799+                num += 1
1800+        ret[k] = map(lambda x: shares[x], ret[k])
1801+    return (ret, len(shares))
1802+
1803+def residual_network(graph, f):
1804+    """
1805+    I return the residual network and residual capacity function of the
1806+    flow network represented by my graph and f arguments. graph is a
1807+    flow network in adjacency-list form, and f is a flow in graph.
1808+    """
1809+    new_graph = [[] for i in xrange(len(graph))]
1810+    cf = [[0 for s in xrange(len(graph))] for sh in xrange(len(graph))]
1811+    for i in xrange(len(graph)):
1812+        for v in graph[i]:
1813+            if f[i][v] == 1:
1814+                # We add an edge (v, i) with cf[v,i] = 1. This means
1815+                # that we can remove 1 unit of flow from the edge (i, v)
1816+                new_graph[v].append(i)
1817+                cf[v][i] = 1
1818+                cf[i][v] = -1
1819+            else:
1820+                # We add the edge (i, v), since we're not using it right
1821+                # now.
1822+                new_graph[i].append(v)
1823+                cf[i][v] = 1
1824+                cf[v][i] = -1
1825+    return (new_graph, cf)
1826+
1827+def augmenting_path_for(graph):
1828+    """
1829+    I return an augmenting path, if there is one, from the source node
1830+    to the sink node in the flow network represented by my graph argument.
1831+    If there is no augmenting path, I return False. I assume that the
1832+    source node is at index 0 of graph, and the sink node is at the last
1833+    index. I also assume that graph is a flow network in adjacency list
1834+    form.
1835+    """
1836+    bfs_tree = bfs(graph, 0)
1837+    if bfs_tree[len(graph) - 1]:
1838+        n = len(graph) - 1
1839+        path = [] # [(u, v)], where u and v are vertices in the graph
1840+        while n != 0:
1841+            path.insert(0, (bfs_tree[n], n))
1842+            n = bfs_tree[n]
1843+        return path
1844+    return False
1845+
1846+def bfs(graph, s):
1847+    """
1848+    Perform a BFS on graph starting at s, where graph is a graph in
1849+    adjacency list form, and s is a node in graph. I return the
1850+    predecessor table that the BFS generates.
1851+    """
1852+    # This is an adaptation of the BFS described in "Introduction to
1853+    # Algorithms", Cormen et al, 2nd ed., p. 532.
1854+    # WHITE vertices are those that we haven't seen or explored yet.
1855+    WHITE = 0
1856+    # GRAY vertices are those we have seen, but haven't explored yet
1857+    GRAY  = 1
1858+    # BLACK vertices are those we have seen and explored
1859+    BLACK = 2
1860+    color        = [WHITE for i in xrange(len(graph))]
1861+    predecessor  = [None for i in xrange(len(graph))]
1862+    distance     = [-1 for i in xrange(len(graph))]
1863+    queue = [s] # vertices that we haven't explored yet.
1864+    color[s] = GRAY
1865+    distance[s] = 0
1866+    while queue:
1867+        n = queue.pop(0)
1868+        for v in graph[n]:
1869+            if color[v] == WHITE:
1870+                color[v] = GRAY
1871+                distance[v] = distance[n] + 1
1872+                predecessor[v] = n
1873+                queue.append(v)
1874+        color[n] = BLACK
1875+    return predecessor
1876}
1877
1878Context:
1879
1880[Dependency on Windmill test framework is not needed yet.
1881david-sarah@jacaranda.org**20100504161043
1882 Ignore-this: be088712bec650d4ef24766c0026ebc8
1883]
1884[tests: pass z to tar so that BSD tar will know to ungzip
1885zooko@zooko.com**20100504090628
1886 Ignore-this: 1339e493f255e8fc0b01b70478f23a09
1887]
1888[setup: update comments and URLs in setup.cfg
1889zooko@zooko.com**20100504061653
1890 Ignore-this: f97692807c74bcab56d33100c899f829
1891]
1892[setup: reorder and extend the show-tool-versions script, the better to glean information about our new buildslaves
1893zooko@zooko.com**20100504045643
1894 Ignore-this: 836084b56b8d4ee8f1de1f4efb706d36
1895]
1896[CLI: Support for https url in option --node-url
1897Francois Deppierraz <francois@ctrlaltdel.ch>**20100430185609
1898 Ignore-this: 1717176b4d27c877e6bc67a944d9bf34
1899 
1900 This patch modifies the regular expression used for verifying of '--node-url'
1901 parameter.  Support for accessing a Tahoe gateway over HTTPS was already
1902 present, thanks to Python's urllib.
1903 
1904]
1905[backupdb.did_create_directory: use REPLACE INTO, not INSERT INTO + ignore error
1906Brian Warner <warner@lothar.com>**20100428050803
1907 Ignore-this: 1fca7b8f364a21ae413be8767161e32f
1908 
1909 This handles the case where we upload a new tahoe directory for a
1910 previously-processed local directory, possibly creating a new dircap (if the
1911 metadata had changed). Now we replace the old dirhash->dircap record. The
1912 previous behavior left the old record in place (with the old dircap and
1913 timestamps), so we'd never stop creating new directories and never converge
1914 on a null backup.
1915]
1916["tahoe webopen": add --info flag, to get ?t=info
1917Brian Warner <warner@lothar.com>**20100424233003
1918 Ignore-this: 126b0bb6db340fabacb623d295eb45fa
1919 
1920 Also fix some trailing whitespace.
1921]
1922[docs: install.html http-equiv refresh to quickstart.html
1923zooko@zooko.com**20100421165708
1924 Ignore-this: 52b4b619f9dde5886ae2cd7f1f3b734b
1925]
1926[docs: install.html -> quickstart.html
1927zooko@zooko.com**20100421155757
1928 Ignore-this: 6084e203909306bed93efb09d0e6181d
1929 It is not called "installing" because that implies that it is going to change the configuration of your operating system. It is not called "building" because that implies that you need developer tools like a compiler. Also I added a stern warning against looking at the "InstallDetails" wiki page, which I have renamed to "AdvancedInstall".
1930]
1931[Fix another typo in tahoe_storagespace munin plugin
1932david-sarah@jacaranda.org**20100416220935
1933 Ignore-this: ad1f7aa66b554174f91dfb2b7a3ea5f3
1934]
1935[Add dependency on windmill >= 1.3
1936david-sarah@jacaranda.org**20100416190404
1937 Ignore-this: 4437a7a464e92d6c9012926b18676211
1938]
1939[licensing: phrase the OpenSSL-exemption in the vocabulary of copyright instead of computer technology, and replicate the exemption from the GPL to the TGPPL
1940zooko@zooko.com**20100414232521
1941 Ignore-this: a5494b2f582a295544c6cad3f245e91
1942]
1943[munin-tahoe_storagespace
1944freestorm77@gmail.com**20100221203626
1945 Ignore-this: 14d6d6a587afe1f8883152bf2e46b4aa
1946 
1947 Plugin configuration rename
1948 
1949]
1950[setup: add licensing declaration for setuptools (noticed by the FSF compliance folks)
1951zooko@zooko.com**20100309184415
1952 Ignore-this: 2dfa7d812d65fec7c72ddbf0de609ccb
1953]
1954[setup: fix error in licensing declaration from Shawn Willden, as noted by the FSF compliance division
1955zooko@zooko.com**20100309163736
1956 Ignore-this: c0623d27e469799d86cabf67921a13f8
1957]
1958[CREDITS to Jacob Appelbaum
1959zooko@zooko.com**20100304015616
1960 Ignore-this: 70db493abbc23968fcc8db93f386ea54
1961]
1962[desert-island-build-with-proper-versions
1963jacob@appelbaum.net**20100304013858]
1964[docs: a few small edits to try to guide newcomers through the docs
1965zooko@zooko.com**20100303231902
1966 Ignore-this: a6aab44f5bf5ad97ea73e6976bc4042d
1967 These edits were suggested by my watching over Jake Appelbaum's shoulder as he completely ignored/skipped/missed install.html and also as he decided that debian.txt wouldn't help him with basic installation. Then I threw in a few docs edits that have been sitting around in my sandbox asking to be committed for months.
1968]
1969[TAG allmydata-tahoe-1.6.1
1970david-sarah@jacaranda.org**20100228062314
1971 Ignore-this: eb5f03ada8ea953ee7780e7fe068539
1972]
1973Patch bundle hash:
197476de9ddd55631daa395a88a9dabc3c46ebc43997