Opened at 2010-01-14T21:02:31Z
Closed at 2010-05-16T06:19:45Z
#902 closed defect (fixed)
network failure => internal TypeError
Reported by: | zooko | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | 1.7.0 |
Component: | code-peerselection | Version: | 1.5.0 |
Keywords: | reliability easy upload | Cc: | francois@… |
Launchpad Bug: |
Description
I was uploading a file when the local telco monopoly decided to turn off my phone and DSL for a few minutes. My tahoe cp command-line then reported ValueError: too many value to unpack on this line of code:
def set_shareholders(self, (used_peers, already_peers), encoder):
It reports that this line is on line 753 of immutable/upload.py.
The version is: allmydata-tahoe: 1.5.0-r4054, foolscap: 0.4.2-zsetupztime, pycryptopp: 0.5.15, zfec: 1.4.5, Twisted: 8.2.0, Nevow: 0.9.31-r15675, zope.interface: 3.1.0c1, python: 2.5.4, platform: Darwin-8.11.1-i386-32bit, sqlite: 3.1.3, simplejson: 2.0.9, argparse: 0.8.0, pyOpenSSL: 0.9, pyutil: 1.5.1, zbase32: 1.1.0, setuptools: 0.6c12dev, pysqlite: 2.3.2
Line 753 of immutable/upload.py at version 4054 is src/allmydata/immutable/upload.py@4054#L753. That method is called from only one place, line 720, which means that locate_all_shareholders() must have returned something which wasn't a tuple containing at least two values. Looking at locate_all_shareholders() I see that what it returns is whatever Tahoe2PeerSelector.get_shareholders() returned. (This fact may not be obvious to you if you aren't familiar with the Twisted Deferreds but it should be obvious to you if you are.)
It looks like get_shareholders() returns whatever is returned by Tahoe2PeerSelector._loop(). There are three (non-recursive) return statements from _loop(). The first one returns a 2-tuple, the second one returns the return-value of Tahoe2PeerSelector._got_response(), and the third one returns self.use_peers. Wait a second, that can't be right -- self.use_peers is a set of servers not a 2-tuple.
When does _loop() reach this return statement? Reading the control flow, it does so when 1. there are no uncontacted servers, and 2. there are no servers in the ask-again set, and 3. there are no servers in the and-then-ask-yet-again set, and 4. we already placed enough shares to be happy. So I guess this is a rare situation, in which we've placed enough shares to be happy simultaneously as all of our servers disappeared. (Also the status message "Placed all shares" seems slightly wrong.) I guess the next step is to write a unit test of this situation. I assume that there isn't one, since if there were it would fail. :-)
But I'm going to stop here and work on other priorities (#778) now, so if anyone else wants to fix this then please go ahead!
Change History (4)
comment:1 Changed at 2010-02-15T20:25:54Z by davidsarah
- Milestone changed from undecided to 1.6.1
comment:2 Changed at 2010-02-16T05:28:18Z by zooko
- Milestone changed from 1.6.1 to 1.7.0
comment:3 Changed at 2010-04-16T16:16:38Z by francois
- Cc francois@… added
comment:4 Changed at 2010-05-16T06:19:45Z by zooko
- Resolution set to fixed
- Status changed from new to closed
Kevan's patches for #778 fixed this bug and also added a unit test that exercises this case.
The bug only bites in a particular race condition when you achieve share-placement happiness at the same time as you lose connections to all your storage server (as far as I can tell). Not important enough to prioritize for v1.6.1 because there are other tickets that we could focus on for v1.6.1.