Opened at 2013-11-26T04:32:53Z
Closed at 2014-04-15T00:49:57Z
#2116 closed defect (fixed)
xml parse error from S4 service
Reported by: | zooko | Owned by: | daira |
---|---|---|---|
Priority: | major | Milestone: | undecided |
Component: | code-storage | Version: | 1.9.0-s3branch |
Keywords: | s3 error txaws retry LeastAuthority | Cc: | |
Launchpad Bug: |
Description (last modified by daira)
zooko@spark ~/research/papers $ tahoe backup . $WRITABLE_DIRCAP: Traceback (most recent call last): File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/runner.py", line 156, in run rc = runner(sys.argv[1:], install_node_control=install_node_control) File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/runner.py", line 141, in runner rc = cli.dispatch[command](so) File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/cli.py", line 574, in backup rc = tahoe_backup.backup(options) File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/tahoe_backup.py", line 325, in backup return bu.run() File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/tahoe_backup.py", line 118, in run new_backup_dircap = self.process(options.from_dir) File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/tahoe_backup.py", line 188, in process childcap = self.process(childpath) File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/tahoe_backup.py", line 194, in process childcap, metadata = self.upload(childpath) File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/tahoe_backup.py", line 305, in upload raise HTTPError("Error during file PUT", resp) HTTPError: Error during file PUT: 500 Internal Server Error "Traceback (most recent call last): File \"/home/zooko/playground/LAFS/pristine/support/lib/python2.7/site-packages/foolscap-0.6.4-py2.7.egg/foolscap/call.py\", line 753, in receiveClose self.request.fail(f) File \"/home/zooko/playground/LAFS/pristine/support/lib/python2.7/site-packages/foolscap-0.6.4-py2.7.egg/foolscap/call.py\", line 95, in fail self.deferred.errback(why) File \"/home/zooko/.local/lib/python2.7/site-packages/twisted/internet/defer.py\", line 421, in errback self._startRunCallbacks(fail) File \"/home/zooko/.local/lib/python2.7/site-packages/twisted/internet/defer.py\", line 488, in _startRunCallbacks self._runCallbacks() --- <exception caught here> --- File \"/home/zooko/.local/lib/python2.7/site-packages/twisted/internet/defer.py\", line 575, in _runCallbacks current.result = callback(current.result, *args, **kw) File \"/home/zooko/playground/LAFS/pristine/src/allmydata/immutable/upload.py\", line 604, in _got_response return self._loop() File \"/home/zooko/playground/LAFS/pristine/src/allmydata/immutable/upload.py\", line 516, in _loop return self._failed(msg) File \"/home/zooko/playground/LAFS/pristine/src/allmydata/immutable/upload.py\", line 617, in _failed raise UploadUnhappinessError(msg) allmydata.interfaces.UploadUnhappinessError: server selection failed for <Tahoe2ServerSelector for upload qd7vq>: shares could be placed or found on only 0 server(s). We were asked to place shares on at least 1 server(s) such that any 1 of them have enough shares to recover the file. (placed 0 shares out of 1 total (1 homeless), want to place shares on at least 1 servers such that any 1 of them have enough shares to recover the file, sent 1 queries to 1 servers, 0 queries placed some shares, 1 placed none (of which 0 placed none due to the server being full and 1 placed none due to an error)) (last failure (from <ServerTracker for server 6dwd2q and SI qd7vq>) was: [Failure instance: Traceback (failure with no frames): <class 'foolscap.tokens.RemoteException'>: <RemoteException around '[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last): File \"/home/customer/LAFS_source/support/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-linux-i686.egg/twisted/internet/tcp.py\", line 287, in connectionLost protocol.connectionLost(reason) File \"/home/customer/LAFS_source/support/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-linux-i686.egg/twisted/web/client.py\", line 193, in connectionLost self.factory._disconnectedDeferred.callback(None) File \"/home/customer/LAFS_source/support/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 368, in callback self._startRunCallbacks(result) File \"/home/customer/LAFS_source/support/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 464, in _startRunCallbacks self._runCallbacks() --- <exception caught here> --- File \"/home/customer/LAFS_source/support/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 551, in _runCallbacks current.result = callback(current.result, *args, **kw) File \"/usr/local/lib/python2.7/dist-packages/txAWS-0.2.1.post4-py2.7.egg/txaws/s3/client.py\", line 153, in _parse_get_bucket root = XML(xml_bytes) File \"/usr/local/lib/python2.7/dist-packages/txAWS-0.2.1.post4-py2.7.egg/txaws/util.py\", line 61, in XML return parser.close() File \"/usr/lib/python2.7/xml/etree/ElementTree.py\", line 1655, in close self._raiseerror(v) File \"/usr/lib/python2.7/xml/etree/ElementTree.py\", line 1507, in _raiseerror raise err xml.etree.ElementTree.ParseError: no element found: line 1, column 0 ]'> ]) "
Change History (29)
comment:1 Changed at 2013-11-26T04:34:59Z by zooko
comment:2 Changed at 2013-11-26T17:00:40Z by daira
comment:3 Changed at 2013-11-26T17:01:10Z by daira
Actually the response from S3 seems to have been empty.
comment:4 follow-up: ↓ 6 Changed at 2013-11-26T17:08:41Z by daira
- Keywords s3 error txaws retry LeastAuthority added
- Owner set to daira
- Status changed from new to assigned
- Version changed from 1.10.0 to 1.9.0-s3branch
Hmm, maybe an empty S3 response should trigger a retry?
comment:5 Changed at 2013-11-26T17:13:13Z by daira
Sigh, I wish we could reduce that error report to something more concise -- although I suppose we should be grateful that the actual error was in there somewhere.
comment:6 in reply to: ↑ 4 Changed at 2013-11-27T19:59:27Z by zooko
Replying to daira:
Hmm, maybe an empty S3 response should trigger a retry?
I guess. :-/
This is what I think of as "snowshoes". Having our upper layer perform some redundancy or retrying just to work-around what appear to be bugs or unreliability from the lower layer. Ick.
comment:7 follow-up: ↓ 8 Changed at 2013-11-27T20:01:57Z by daira
Well, I'm philosophically resistant to that as well, but I have no confidence that we can get the underlying problem with S3 fixed.
comment:8 in reply to: ↑ 7 Changed at 2013-11-27T20:03:38Z by zooko
Replying to daira:
Well, I'm philosophically resistant to that as well, but I have no confidence that we can get the underlying problem with S3 fixed.
Yeah, I agree that we should go ahead and add a limited number of exponential-backoff-retries for this.
comment:9 Changed at 2013-11-27T20:04:42Z by daira
Note that this isn't the first time I've seen an empty response from S3. They are rare, but they've come up before - we just haven't recorded that problem until now.
comment:10 Changed at 2013-11-28T12:52:35Z by zooko
I just a similar failure again:
Traceback (most recent call last): File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/runner.py", line 156, in run rc = runner(sys.argv[1:], install_node_control=install_node_control) File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/runner.py", line 141, in runner rc = cli.dispatch[command](so) File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/cli.py", line 574, in backup rc = tahoe_backup.backup(options) File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/tahoe_backup.py", line 325, in backup return bu.run() File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/tahoe_backup.py", line 118, in run new_backup_dircap = self.process(options.from_dir) File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/tahoe_backup.py", line 188, in process childcap = self.process(childpath) File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/tahoe_backup.py", line 194, in process childcap, metadata = self.upload(childpath) File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/tahoe_backup.py", line 305, in upload raise HTTPError("Error during file PUT", resp) HTTPError: Error during file PUT: 500 Internal Server Error "Traceback (most recent call last): File \"/home/zooko/playground/LAFS/pristine/support/lib/python2.7/site-packages/foolscap-0.6.4-py2.7.egg/foolscap/call.py\", line 753, in receiveClose self.request.fail(f) File \"/home/zooko/playground/LAFS/pristine/support/lib/python2.7/site-packages/foolscap-0.6.4-py2.7.egg/foolscap/call.py\", line 95, in fail self.deferred.errback(why) File \"/home/zooko/.local/lib/python2.7/site-packages/twisted/internet/defer.py\", line 421, in errback self._startRunCallbacks(fail) File \"/home/zooko/.local/lib/python2.7/site-packages/twisted/internet/defer.py\", line 488, in _startRunCallbacks self._runCallbacks() --- <exception caught here> --- File \"/home/zooko/.local/lib/python2.7/site-packages/twisted/internet/defer.py\", line 575, in _runCallbacks current.result = callback(current.result, *args, **kw) File \"/home/zooko/playground/LAFS/pristine/src/allmydata/immutable/upload.py\", line 604, in _got_response return self._loop() File \"/home/zooko/playground/LAFS/pristine/src/allmydata/immutable/upload.py\", line 516, in _loop return self._failed(msg) File \"/home/zooko/playground/LAFS/pristine/src/allmydata/immutable/upload.py\", line 617, in _failed raise UploadUnhappinessError(msg) allmydata.interfaces.UploadUnhappinessError: server selection failed for <Tahoe2ServerSelector for upload s7uqw>: shares could be placed or found on only 0 server(s). We were asked to place shares on at least 1 server(s) such that any 1 of them have enough shares to recover the file. (placed 0 shares out of 1 total (1 homeless), want to place shares on at least 1 servers such that any 1 of them have enough shares to recover the file, sent 1 queries to 1 servers, 0 queries placed some shares, 1 placed none (of which 0 placed none due to the server being full and 1 placed none due to an error)) (last failure (from <ServerTracker for server 6dwd2q and SI s7uqw>) was: [Failure instance: Traceback (failure with no frames): <class 'foolscap.tokens.RemoteException'>: <RemoteException around '[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last): File \"/home/customer/LAFS_source/support/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-linux-i686.egg/twisted/internet/tcp.py\", line 287, in connectionLost protocol.connectionLost(reason) File \"/home/customer/LAFS_source/support/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-linux-i686.egg/twisted/web/client.py\", line 193, in connectionLost self.factory._disconnectedDeferred.callback(None) File \"/home/customer/LAFS_source/support/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 368, in callback self._startRunCallbacks(result) File \"/home/customer/LAFS_source/support/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 464, in _startRunCallbacks self._runCallbacks() --- <exception caught here> --- File \"/home/customer/LAFS_source/support/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 551, in _runCallbacks current.result = callback(current.result, *args, **kw) File \"/usr/local/lib/python2.7/dist-packages/txAWS-0.2.1.post4-py2.7.egg/txaws/s3/client.py\", line 153, in _parse_get_bucket root = XML(xml_bytes) File \"/usr/local/lib/python2.7/dist-packages/txAWS-0.2.1.post4-py2.7.egg/txaws/util.py\", line 61, in XML return parser.close() File \"/usr/lib/python2.7/xml/etree/ElementTree.py\", line 1655, in close self._raiseerror(v) File \"/usr/lib/python2.7/xml/etree/ElementTree.py\", line 1507, in _raiseerror raise err xml.etree.ElementTree.ParseError: no element found: line 1, column 0 ]'> ]) "
comment:11 Changed at 2013-11-29T05:42:21Z by zooko
Hm, I seem to get this every time that I run a full tahoe backup on my data now. ☹ It seems like the LeastAuthority S4 service is suffering badly from this.
File \"/usr/local/lib/python2.7/dist-packages/txAWS-0.2.1.post4-py2.7.egg/txaws/s3/client.py\", line 153, in _parse_get_bucket root = XML(xml_bytes) File \"/usr/local/lib/python2.7/dist-packages/txAWS-0.2.1.post4-py2.7.egg/txaws/util.py\", line 61, in XML return parser.close() File \"/usr/lib/python2.7/xml/etree/ElementTree.py\", line 1655, in close self._raiseerror(v) File \"/usr/lib/python2.7/xml/etree/ElementTree.py\", line 1507, in _raiseerror raise err xml.etree.ElementTree.ParseError: no element found: line 1, column 0 ]'> ]) "
comment:12 Changed at 2013-11-30T20:55:30Z by daira
- Priority changed from normal to major
comment:13 Changed at 2013-12-01T16:21:22Z by zooko
I just got this same error again, but this time on read instead of on upload. Manually retrying a couple of times (by hitting C-r in my web browser) got it to work.
comment:14 Changed at 2014-03-04T19:29:52Z by daira
CyberAxe just saw the same error: http://sebsauvage.net/paste/?e301277924eb38dc#6W5WPJkf1HgW3+rdFVvQ/2BqzYIZ0ZwQyPUb08+kY0A=
comment:15 Changed at 2014-03-04T19:30:43Z by daira
I'm going to make it retry.
comment:16 follow-up: ↓ 17 Changed at 2014-03-10T21:45:28Z by daira
Note that the error occurs in txAWS when it tries to parse an empty response body for a 500 error as XML. The fix might need to be in txAWS as well.
It's a bit difficult to test this, because the tests intentionally do not depend on txAWS (currently they only test the mock cloud backend, not the service container implementations).
comment:17 in reply to: ↑ 16 Changed at 2014-03-11T15:55:09Z by daira
Replying to daira:
Note that the error occurs in txAWS when it tries to parse an empty response body for a 500 error as XML.
No, this is a wrong inference. The error occurs in txaws.s3.client._parse_get_bucket, which should not be reached if the S3 server responded with a 500 error. (The 500 error in the stack trace is from the Tahoe-LAFS gateway, not S3.) So it looks like S3 returned success with an empty body.
comment:18 Changed at 2014-03-17T22:03:07Z by daira
- Description modified (diff)
comment:19 Changed at 2014-03-17T22:23:31Z by daira
$ python Python 2.7.5+ (default, Jun 2 2013, 13:26:34) [GCC 4.7.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from xml.etree.ElementTree import XMLTreeBuilder >>> def XML(text): ... parser = NamespaceFixXmlTreeBuilder() ... parser.feed(text) ... return parser.close() ... >>> class NamespaceFixXmlTreeBuilder(XMLTreeBuilder): ... def _fixname(self, key): ... if "}" in key: ... key = key.split("}", 1)[1] ... return key ... >>> XML("") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 4, in XML File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1654, in close self._raiseerror(v) File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror raise err xml.etree.ElementTree.ParseError: no element found: line 1, column 0
comment:20 Changed at 2014-03-17T22:25:24Z by daira
It looks like only an empty string will give the xml.etree.ElementTree.ParseError: no element found: line 1, column 0 error.
comment:21 Changed at 2014-03-18T17:00:13Z by CyberAxe
Got this last night, did Za do the update on my backend? If so it didn't work.
:: --------------------- START of LOG ------------ :: START Date: 20141803 Time: 24638 :: Backup-Now assuming tahoe: as alias and BackedUp as Target Location in Grid :: c:\Python27\ must be in system PATH: see tahoe-lafs Windows wiki for help :: Will Backup everything in folder c:\BackMeUp\ :: v0040 :: --------------------- Start Backup ------------ :: --------------------- Errors Backup ----------- Traceback (most recent call last): File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 156, in run rc = runner(sys.argv[1:], install_node_control=install_node_control) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 141, in runner rc = cli.dispatch[command](so) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\cli.py", line 574, in backup rc = tahoe_backup.backup(options) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 325, in backup return bu.run() File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 118, in run new_backup_dircap = self.process(options.from_dir) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 188, in process childcap = self.process(childpath) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 188, in process childcap = self.process(childpath) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 188, in process childcap = self.process(childpath) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 188, in process childcap = self.process(childpath) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 194, in process childcap, metadata = self.upload(childpath) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 305, in upload raise HTTPError("Error during file PUT", resp) HTTPError: Error during file PUT: 500 Internal Server Error "Traceback (most recent call last): File \"c:\\allmydata-tahoe-1.10.0\\src\\allmydata\\util\\pipeline.py\", line 65, in _ebDeferred self.checkForFinished() File \"c:\\allmydata-tahoe-1.10.0\\src\\allmydata\\util\\pipeline.py\", line 53, in checkForFinished self.errback(self.failure) File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\twisted-12.3.0-py2.7-win-amd64.egg\\twisted\\internet\\defer.py\", line 422, in errback self._startRunCallbacks(fail) File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\twisted-12.3.0-py2.7-win-amd64.egg\\twisted\\internet\\defer.py\", line 489, in _startRunCallbacks self._runCallbacks() --- <exception caught here> --- File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\twisted-12.3.0-py2.7-win-amd64.egg\\twisted\\internet\\defer.py\", line 576, in _runCallbacks current.result = callback(current.result, *args, **kw) File \"c:\\allmydata-tahoe-1.10.0\\src\\allmydata\\immutable\\encode.py\", line 501, in _remove_shareholder raise UploadUnhappinessError(msg) allmydata.interfaces.UploadUnhappinessError: shares could be placed or found on only 0 server(s). We were asked to place shares on at least 1 server(s) such that any 1 of them have enough shares to recover the file.: [Failure instance: Traceback (failure with no frames): <class 'allmydata.util.pipeline.PipelineError'>: <PipelineError error=([Failure instance: Traceback (failure with no frames): <class 'foolscap.tokens.RemoteException'>: <RemoteException around '[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last): Failure: twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion. ]'> ])> ] " :: --------------------- End Backup --------------
comment:22 follow-up: ↓ 28 Changed at 2014-03-21T18:21:35Z by CyberAxe
:: --------------------- START of LOG ------------ :: START Date: 20142103 Time: 21327 :: Backup-Now assuming tahoe: as alias and BackedUp? as Target Location in Grid :: c:\Python27\ must be in system PATH: see tahoe-lafs Windows wiki for help :: Will Backup everything in folder c:\BackMeUp?\ :: v0040 :: --------------------- Start Backup ------------ :: --------------------- Errors Backup -----------
Traceback (most recent call last): File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 156, in run rc = runner(sys.argv[1:], install_node_control=install_node_control) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 141, in runner rc = cli.dispatch[command](so) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\cli.py", line 574, in backup rc = tahoe_backup.backup(options) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 325, in backup return bu.run() File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 118, in run new_backup_dircap = self.process(options.from_dir) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 188, in process childcap = self.process(childpath) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 188, in process childcap = self.process(childpath) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 188, in process childcap = self.process(childpath) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 188, in process childcap = self.process(childpath) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 194, in process childcap, metadata = self.upload(childpath) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 305, in upload raise HTTPError("Error during file PUT", resp) HTTPError: Error during file PUT: 500 Internal Server Error "Traceback (most recent call last): File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\foolscap-0.6.4-py2.7.egg\\foolscap\\call.py\", line 753, in receiveClose self.request.fail(f) File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\foolscap-0.6.4-py2.7.egg\\foolscap\\call.py\", line 95, in fail self.deferred.errback(why) File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\twisted-12.3.0-py2.7-win-amd64.egg\\twisted\\internet\\defer.py\", line 422, in errback self._startRunCallbacks(fail) File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\twisted-12.3.0-py2.7-win-amd64.egg\\twisted\\internet\\defer.py\", line 489, in _startRunCallbacks self._runCallbacks() --- <exception caught here> --- File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\twisted-12.3.0-py2.7-win-amd64.egg\\twisted\\internet\\defer.py\", line 576, in _runCallbacks current.result = callback(current.result, *args, **kw) File \"c:\\allmydata-tahoe-1.10.0\\src\\allmydata\\immutable\\upload.py\", line 604, in _got_response return self._loop() File \"c:\\allmydata-tahoe-1.10.0\\src\\allmydata\\immutable\\upload.py\", line 516, in _loop return self._failed(msg) File \"c:\\allmydata-tahoe-1.10.0\\src\\allmydata\\immutable\\upload.py\", line 617, in _failed raise UploadUnhappinessError(msg) allmydata.interfaces.UploadUnhappinessError: server selection failed for <Tahoe2ServerSelector for upload qtquq>: shares could be placed or found on only 0 server(s). We were asked to place shares on at least 1 server(s) such that any 1 of them have enough shares to recover the file. (placed 0 shares out of 1 total (1 homeless), want to place shares on at least 1 servers such that any 1 of them have enough shares to recover the file, sent 1 queries to 1 servers, 0 queries placed some shares, 1 placed none (of which 0 placed none due to the server being full and 1 placed none due to an error)) (last failure (from <ServerTracker for server xfia4afr and SI qtquq>) was: [Failure instance: Traceback (failure with no frames): <class 'foolscap.tokens.RemoteException'>: <RemoteException around '[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last): File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/tcp.py\", line 299, in connectionLost protocol.connectionLost(reason) File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/web/client.py\", line 204, in connectionLost self.factory._disconnectedDeferred.callback(None) File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 382, in callback self._startRunCallbacks(result) File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 490, in _startRunCallbacks self._runCallbacks() --- <exception caught here> --- File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 577, in _runCallbacks current.result = callback(current.result, *args, **kw) File \"/home/customer/LAFS_source/src/allmydata/storage/backends/cloud/cloud_common.py\", line 370, in _retry d2 = self._handle_error(f, 1, None, description, operation, *args, **kwargs) File \"/home/customer/LAFS_source/src/allmydata/storage/backends/cloud/cloud_common.py\", line 417, in _handle_error d = task.deferLater(self._reactor, BACKOFF_SECONDS_BEFORE_RETRY[trynum-1], operation, *args, **kwargs) exceptions.AttributeError: S3Container instance has no attribute '_reactor' ]'> ]) "}}} :: --------------------- End Backup -------------- :: --------------------- Start View Latest ------- :: --------------------- End View Latest --------- :: --------------------- Start Deep Check ------- :: SET TO RUN :: --------------------- End Deep Check --------- :: --------------------- Start Grid Stats -------- :: SET TO RUN :: --------------------- End Grid Stats --------- :: --------------------- Start Log Backup ------- :: Copy local log files to future backups to GRID :: --------------------- END Log Backup ------- :: END Date: Fri 03/21/2014 Time: 2:49:27.69 :: --------------------- END of LOG --------------
comment:23 follow-up: ↓ 26 Changed at 2014-03-21T21:13:23Z by CyberAxe
(3:10:26 PM) CyberAxe?: Seems like instead of the 500 error now I'm getting it just stuck over and over, maybe I'm not waitting long enought for the 500 error to generate but I've waited hours. http://sebsauvage.net/paste/?f38e2b33f87e1422#A8iAnduHoydJrE6XejuwYX6CHoRx/iWBxUb+OLx1mvM= I have to terminate the process here's what it says when I do that. (3:12:20 PM) CyberAxe?: the last line it's stuck on in this example is a create folder action, when I restart backup it does use that folder, so it is creating it re-using old directory for 'c:\BackMeUp?\WesternColorado_files\images\WaspWar3_files\9' (3:12:20 PM) CyberAxe?: re-using old directory for 'c:\BackMeUp?\WesternColorado_files\images\WaspWar3_files' (3:12:20 PM) CyberAxe?: re-using old directory for 'c:\BackMeUp?\WesternColorado_files\images' (3:12:20 PM) CyberAxe?: re-using old directory for 'c:\BackMeUp?\WesternColorado_files' (3:12:20 PM) CyberAxe?: re-using old directory for 'c:\BackMeUp?\' <----- where it got stuck.
comment:24 Changed at 2014-03-22T02:23:07Z by CyberAxe
Here's another trackback error I got after the backend update to fix 500 error
File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\twisted-12.3.0-py2.7-win-amd64.egg\\twisted\\internet\\defer.py\", line 422, in errback self._startRunCallbacks(fail) File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\twisted-12.3.0-py2.7-win-amd64.egg\\twisted\\internet\\defer.py\", line 489, in _startRunCallbacks self._runCallbacks() --- <exception caught here> --- File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\twisted-12.3.0-py2.7-win-amd64.egg\\twisted\\internet\\defer.py\", line 576, in _runCallbacks current.result = callback(current.result, *args, **kw) File \"c:\\allmydata-tahoe-1.10.0\\src\\allmydata\\immutable\\encode.py\", line 501, in _remove_shareholder raise UploadUnhappinessError(msg) allmydata.interfaces.UploadUnhappinessError: shares could be placed or found on only 0 server(s). We were asked to place shares on at least 1 server(s) such that any 1 of them have enough shares to recover the file.: [Failure instance: Traceback (failure with no frames): <class 'allmydata.util.pipeline.PipelineError'>: <PipelineError error=([Failure instance: Traceback (failure with no frames): <class 'foolscap.tokens.RemoteException'>: <RemoteException around '[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last): File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/tcp.py\", line 299, in connectionLost protocol.connectionLost(reason) File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/web/client.py\", line 204, in connectionLost self.factory._disconnectedDeferred.callback(None) File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 382, in callback self._startRunCallbacks(result) File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 490, in _startRunCallbacks self._runCallbacks() --- <exception caught here> --- File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 577, in _runCallbacks current.result = callback(current.result, *args, **kw) File \"/home/customer/LAFS_source/src/allmydata/storage/backends/cloud/cloud_common.py\", line 370, in _retry d2 = self._handle_error(f, 1, None, description, operation, *args, **kwargs) File \"/home/customer/LAFS_source/src/allmydata/storage/backends/cloud/cloud_common.py\", line 417, in _handle_error d = task.deferLater(self._reactor, BACKOFF_SECONDS_BEFORE_RETRY[trynum-1], operation, *args, **kwargs) exceptions.AttributeError: S3Container instance has no attribute '_reactor' ]'> ])>
comment:25 Changed at 2014-03-22T02:27:43Z by CyberAxe
Here's another error after the last one
:: --------------------- START of LOG ------------ :: START Date: 20142103 Time: 174729 :: Backup-Now assuming tahoe: as alias and BackedUp as Target Location in Grid :: c:\Python27\ must be in system PATH: see tahoe-lafs Windows wiki for help :: Will Backup everything in folder c:\BackMeUp\ :: v0040 :: --------------------- Start Backup ------------ :: --------------------- Errors Backup ----------- Traceback (most recent call last): File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 156, in run rc = runner(sys.argv[1:], install_node_control=install_node_control) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 141, in runner rc = cli.dispatch[command](so) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\cli.py", line 574, in backup rc = tahoe_backup.backup(options) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 325, in backup return bu.run() File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 123, in run put_child(archives_url, now, new_backup_dircap) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 57, in put_child raise HTTPError("Error during put_child", resp) HTTPError: Error during put_child: 500 Internal Server Error "Traceback (most recent call last): Failure: allmydata.mutable.common.NotEnoughServersError: ('Publish ran out of good servers, last failure was: [Failure instance: Traceback (failure with no frames): <class \\'foolscap.tokens.RemoteException\\'>: <RemoteException around \\'[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last):\\n File \"/usr/lib/python2.7/dist-packages/foolscap/eventual.py\", line 26, in _turn\\n cb(*args, **kwargs)\\n File \"/home/customer/LAFS_source/src/allmydata/util/deferredutil.py\", line 55, in _with_log\\n op(res)\\n File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 382, in callback\\n self._startRunCallbacks(result)\\n File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 490, in _startRunCallbacks\\n self._runCallbacks()\\n--- <exception caught here> ---\\n File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 577, in _runCallbacks\\n current.result = callback(current.result, *args, **kw)\\n File \"/home/customer/LAFS_source/src/allmydata/storage/backends/base.py\", line 170, in _update_lease\\n account.add_or_renew_default_lease(self.storage_index, shnum)\\n File \"/home/customer/LAFS_source/src/allmydata/storage/account.py\", line 58, in add_or_renew_default_lease\\n return self.add_or_renew_lease(storage_index, shnum, renewal_time, expiration_time)\\n File \"/home/customer/LAFS_source/src/allmydata/storage/account.py\", line 63, in add_or_renew_lease\\n renewal_time, expiration_time)\\n File \"/home/customer/LAFS_source/src/allmydata/storage/leasedb.py\", line 267, in add_or_renew_leases\\n raise NonExistentShareError(si_s, shnum)\\nallmydata.storage.leasedb.NonExistentShareError: can\\'t find SI=\\'o2ofk2psfahdhyf5uq3nqdmrwe\\' shnum=0 in \`shares\` table\\n]\\'>\\n]', None)
comment:26 in reply to: ↑ 23 Changed at 2014-03-22T02:39:33Z by CyberAxe
I found that my script wasn't showing me it had moved on to a deep-check and was done.
Replying to CyberAxe:
(3:10:26 PM) CyberAxe: Seems like instead of the 500 error now I'm getting it just stuck over and over, maybe I'm not waitting long enought for the 500 error to generate but I've waited hours. http://sebsauvage.net/paste/?f38e2b33f87e1422#A8iAnduHoydJrE6XejuwYX6CHoRx/iWBxUb+OLx1mvM= I have to terminate the process here's what it says when I do that. (3:12:20 PM) CyberAxe: the last line it's stuck on in this example is a create folder action, when I restart backup it does use that folder, so it is creating it re-using old directory for 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9' (3:12:20 PM) CyberAxe: re-using old directory for 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files' (3:12:20 PM) CyberAxe: re-using old directory for 'c:\BackMeUp\WesternColorado_files\images' (3:12:20 PM) CyberAxe: re-using old directory for 'c:\BackMeUp\WesternColorado_files' (3:12:20 PM) CyberAxe: re-using old directory for 'c:\BackMeUp\' <----- where it got stuck.
comment:27 Changed at 2014-03-22T17:27:39Z by CyberAxe
Here's what I'm getting now:
processing 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9' skipping 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9\0_0.jpg'.. skipping 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9\0_1.jpg'.. skipping 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9\1_0.jpg'.. skipping 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9\1_1.jpg'.. re-using old directory for 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9' re-using old directory for 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files' re-using old directory for 'c:\BackMeUp\WesternColorado_files\images' re-using old directory for 'c:\BackMeUp\WesternColorado_files' creating directory for 'c:\BackMeUp\' Traceback (most recent call last): File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 156, in run rc = runner(sys.argv[1:], install_node_control=install_node_control) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 141, in runner rc = cli.dispatch[command](so) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\cli.py", line 574, in backup rc = tahoe_backup.backup(options) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 325, in backup return bu.run() File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 123, in run put_child(archives_url, now, new_backup_dircap) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 57, in put_child raise HTTPError("Error during put_child", resp) HTTPError: Error during put_child: 500 Internal Server Error "Traceback (most recent call last): Failure: allmydata.mutable.common.NotEnoughServersError: ('Publish ran out of good servers, last failure was: [Failure instance: Traceback ( failure with no frames): <class \\'foolscap.tokens.RemoteException\\'>: <RemoteException around \\'[CopiedFailure instance: Traceback from r emote host -- Traceback (most recent call last):\\n File \"/usr/lib/python2.7/dist-packages/foolscap/eventual.py\", line 26, in _turn\\n cb(*args, **kwargs)\\n File \"/home/customer/LAFS_source/src/allmydata/util/deferredutil.py\", line 55, in _with_log\\n op(res)\\n Fil e \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 382, in callback\\n self ._startRunCallbacks(result)\\n File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\ ", line 490, in _startRunCallbacks\\n self._runCallbacks()\\n--- <exception caught here> ---\\n File \"/usr/local/lib/python2.7/dist-pac kages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 577, in _runCallbacks\\n current.result = callback(current.re sult, *args, **kw)\\n File \"/home/customer/LAFS_source/src/allmydata/storage/backends/base.py\", line 170, in _update_lease\\n account. add_or_renew_default_lease(self.storage_index, shnum)\\n File \"/home/customer/LAFS_source/src/allmydata/storage/account.py\", line 58, in add_or_renew_default_lease\\n return self.add_or_renew_lease(storage_index, shnum, renewal_time, expiration_time)\\n File \"/home/custom er/LAFS_source/src/allmydata/storage/account.py\", line 63, in add_or_renew_lease\\n renewal_time, expiration_time)\\n File \"/home/cust omer/LAFS_source/src/allmydata/storage/leasedb.py\", line 267, in add_or_renew_leases\\n raise NonExistentShareError(si_s, shnum)\\nallmy data.storage.leasedb.NonExistentShareError: can\\'t find SI=\\'o2ofk2psfahdhyf5uq3nqdmrwe\\' shnum=0 in \`shares\` table\\n]\\'>\\n]', None) "
comment:28 in reply to: ↑ 22 Changed at 2014-03-25T03:10:49Z by daira
Replying to CyberAxe:
> File \"/home/customer/LAFS_source/src/allmydata/storage/backends/cloud/cloud_common.py\", line 370, in _retry > d2 = self._handle_error(f, 1, None, description, operation, *args, **kwargs) > File \"/home/customer/LAFS_source/src/allmydata/storage/backends/cloud/cloud_common.py\", line 417, in _handle_error > d = task.deferLater(self._reactor, BACKOFF_SECONDS_BEFORE_RETRY[trynum-1], operation, *args, **kwargs) > exceptions.AttributeError: S3Container instance has no attribute '_reactor'
This is a separate bug in the S3 retry logic. Filed as #2206.
comment:29 Changed at 2014-04-15T00:49:57Z by daira
- Resolution set to fixed
- Status changed from assigned to closed
Fixed in https://github.com/tahoe-lafs/tahoe-lafs/commit/ea3111d24a97bf9873964383430a9f2e9ff5eb70 on the 1819-cloud-merge branch.
I tried the same command-line again, and this time it worked. I had started by tahoe-lafs gateway a few second before I got this xml parse error, so I suspect that this is a case of #719.