Opened at 2009-07-14T04:19:24Z
Closed at 2009-07-17T05:13:14Z
#758 closed defect (fixed)
maximum recursion depth exceeded in Tahoe2PeerSelector
Reported by: | zooko | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | 1.5.0 |
Component: | code-peerselection | Version: | 1.4.1 |
Keywords: | Cc: | ||
Launchpad Bug: |
Description
I just got this traceback from a node using the volunteergrid:
/usr/local/lib/python2.6/dist-packages/Twisted-8.2.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py, line 328 in _runCallbacks 326 self._runningCallbacks = True 327 try: 328 self.result = callback(self.result, *args, **kw) 329 finally: Locals callback <bound method Tahoe2PeerSelector._got_response of <Tahoe2PeerSelector for upload nztp5>> self <Deferred at 0x4d93a70 current result: None> args (<PeerTracker for peer xjy2clbq and SI nztp5>, set([19, 20]), [<PeerTracker for peer gapnio7p and SI nztp5>]) kw {} /home/volunteergrid/src/tahoe/src/allmydata/immutable/upload.py, line 384 in _got_response 382 383 # now loop 384 return self._loop() 385 Locals self <Tahoe2PeerSelector for upload nztp5> /home/volunteergrid/src/tahoe/src/allmydata/immutable/upload.py, line 284 in _loop 282 self.contacted_peers.extend(self.contacted_peers2) 283 self.contacted_peers[:] = [] 284 return self._loop() 285 else: Locals self <Tahoe2PeerSelector for upload nztp5> /home/volunteergrid/src/tahoe/src/allmydata/immutable/upload.py, line 284 in _loop 282 self.contacted_peers.extend(self.contacted_peers2) 283 self.contacted_peers[:] = [] 284 return self._loop() 285 else: Locals self <Tahoe2PeerSelector for upload nztp5>
(And so forth until maximum recursion depth exceeded.)
There are only 15 servers on the volunteergrid right now. The clause that is shown, around 279 of upload.py is for the case that all servers have been asked to hold a share, and then all servers have been asked to hold a second share, and this clause is to iterate and go on to ask them to hold yet a third-or-greater share.
It appears that this loop never terminated before the recursion depth was exceeded. We have tests of this case, but... Hey waitaminute! That code in upload.py says:
elif self.contacted_peers2: # we've finished the second-or-later pass. Move all the remaining # peers back into self.contacted_peers for the next pass self.contacted_peers.extend(self.contacted_peers2) self.contacted_peers[:] = [] return self._loop()
That can't be right. It probably means to say:
self.contacted_peers.extend(self.contacted_peers2) del self.contacted_peers2[:]
Why does that test catch this bug?
But it is too late at night for me to be messing with such stuff.
If someone in a different timezone or a different sleep schedule wants to fix the test to catch this bug while I sleep, that would be great! :-)
Change History (3)
comment:1 Changed at 2009-07-15T03:45:54Z by terrell
- Summary changed from maxmimum recursion depth exceeded in Tahoe2PeerSelector to maximum recursion depth exceeded in Tahoe2PeerSelector
comment:2 Changed at 2009-07-15T07:15:50Z by warner
comment:3 Changed at 2009-07-17T05:13:14Z by warner
- Resolution set to fixed
- Status changed from new to closed
This should be fixed, by 1192b61dfed62a49.
Huh, yeah, that code !!!is!!! odd.. your analysis feel right, but I'm too jetlagged to understand this code right now either. I want to rewrite the uploader anyways, but that's not going to happen for 1.5.