#1676 closed defect (fixed)

ERROR: AssertionError(), apparently on mutable publish during check/repair (current git)

Reported by: killyourtv Owned by: killyourtv
Priority: critical Milestone: 1.9.2
Component: code-mutable Version: 1.9.1
Keywords: i2p mutable repair error assertion regression Cc: killyourtv@…
Launchpad Bug:

Description (last modified by killyourtv)

Running a deep-check on one of my shares on the I2P grid yielded the following:

'debian/dists/squeeze/main/binary-i386/Release': healthy
'debian/dists/squeeze/main/binary-powerpc': healthy
'debian/dists/squeeze/main/binary-powerpc/Packages': healthy
'debian/dists/squeeze/main/binary-powerpc/Packages.bz2': not healthy
 repair successful
'debian/dists/squeeze/main/binary-powerpc/Packages.gz': healthy
'debian/dists/squeeze/main/binary-powerpc/Release': healthy
'debian/dists/squeeze/main/source': healthy
'debian/dists/squeeze/main/source/Release': healthy
'debian/dists/squeeze/main/source/Sources.gz': not healthy
 repair failed
'debian/pool': healthy
'debian/pool/dists': healthy
'debian/pool/dists/squeeze': healthy
'debian/pool/dists/squeeze/main': healthy
'debian/pool/dists/squeeze/main/binary-amd64': healthy
ERROR: AssertionError()
"[Failure instance: Traceback: <type 'exceptions.AssertionError'>: "
$INSTALL_DIR/src/allmydata/mutable/filenode.py:563:upload
$INSTALL_DIR/src/allmydata/mutable/filenode.py:661:_do_serialized
/usr/lib/python2.7/dist-packages/twisted/internet/defer.py:298:addCallback
/usr/lib/python2.7/dist-packages/twisted/internet/defer.py:287:addCallbacks
--- <exception caught here> ---
/usr/lib/python2.7/dist-packages/twisted/internet/defer.py:545:_runCallbacks
$INSTALL_DIR/src/allmydata/mutable/filenode.py:661:<lambda>
$INSTALL_DIR/src/allmydata/mutable/filenode.py:689:_upload
$INSTALL_DIR/src/allmydata/mutable/publish.py:404:publish

In twistd.log, (I don't know if it's related, I was AFK), I see

2012-02-14 08:53:30+0000 [-] Unhandled error in Deferred:
2012-02-14 08:53:30+0000 [-] Unhandled Error
	Traceback (most recent call last):
	  File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 800, in runUntilCurrent
	    call.func(*call.args, **call.kw)
	  File "/usr/lib/pymodules/python2.7/foolscap/eventual.py", line 26, in _turn
	    cb(*args, **kwargs)
	  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 362, in callback
	    self._startRunCallbacks(result)
	  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 458, in _startRunCallbacks
	    self._runCallbacks()
	--- <exception caught here> ---
	  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 545, in _runCallbacks
	    current.result = callback(current.result, *args, **kw)
	  File "/usr/lib/python2.7/dist-packages/nevow/appserver.py", line 181, in _cbFinishRender
	    self.finishRequest(  True )
	  File "/usr/lib/python2.7/dist-packages/nevow/appserver.py", line 176, in finishRequest
	    server.Request.finish(self)
	  File "/usr/lib/python2.7/dist-packages/twisted/web/http.py", line 866, in finish
	    "Request.finish called on a request after its connection was lost; "
	exceptions.RuntimeError: Request.finish called on a request after its connection was lost; use Request.notifyFinish to keep track of this.

Attachments (1)

stillactive.png (27.0 KB) - added by killyourtv at 2012-02-14T11:18:49Z.
active operations

Download all attachments as: .zip

Change History (15)

Changed at 2012-02-14T11:18:49Z by killyourtv

active operations

comment:1 Changed at 2012-02-14T11:40:41Z by killyourtv

As mentioned in previous tickets, tasks remain in the Active Operations section long after failures occur, such as

active operations

The top two jobs failed around 12 hours earlier. The bottom one is the AssertionError? that I've reported in this ticket.

Version 1, edited at 2012-02-14T11:53:48Z by killyourtv (previous) (next) (diff)

comment:2 Changed at 2012-02-14T11:41:28Z by killyourtv

  • Description modified (diff)

comment:3 Changed at 2012-02-14T20:32:54Z by killyourtv

Update: For what it's worth, with the current git revision the deep-check consistently fails at the same point.

$ tahoe deep-check -v --repair --add-lease tahoe:debian/pool/dists/squeeze
'<root>': healthy
'main': healthy
'main/binary-amd64': healthy
ERROR: AssertionError()
"[Failure instance: Traceback: <type 'exceptions.AssertionError'>: "
$INSTALL_DIR/src/allmydata/mutable/filenode.py:563:upload
$INSTALL_DIR/src/allmydata/mutable/filenode.py:661:_do_serialized
/usr/lib/python2.7/dist-packages/twisted/internet/defer.py:298:addCallback
/usr/lib/python2.7/dist-packages/twisted/internet/defer.py:287:addCallbacks
--- <exception caught here> ---
/usr/lib/python2.7/dist-packages/twisted/internet/defer.py:545:_runCallbacks
$INSTALL_DIR/src/allmydata/mutable/filenode.py:661:<lambda>
$INSTALL_DIR/src/allmydata/mutable/filenode.py:689:_upload
$INSTALL_DIR/src/allmydata/mutable/publish.py:404:publish

Reverting back to 1.8.3:

$ tahoe deep-check -v --repair --add-lease tahoe:debian/dists/squeeze
'<root>': healthy
'Contents-amd64': healthy
'Contents-amd64.bz2': healthy
'Contents-amd64.gz': healthy
'Contents-armel': healthy
'Contents-armel.bz2': healthy
'Contents-armel.gz': healthy
'Contents-i386': healthy
'Contents-i386.bz2': healthy
'Contents-i386.gz': healthy
'Contents-powerpc': healthy
'Contents-powerpc.bz2': healthy
'Contents-powerpc.gz': healthy
'InRelease': healthy
'Release': healthy
'Release.gpg': healthy
'main': healthy
'main/binary-amd64': healthy
'main/binary-amd64/Packages': healthy
'main/binary-amd64/Packages.bz2': healthy
'main/binary-amd64/Packages.gz': healthy
'main/binary-amd64/Release': healthy
'main/binary-armel': healthy
'main/binary-armel/Packages': healthy
'main/binary-armel/Packages.bz2': healthy
'main/binary-armel/Packages.gz': healthy
'main/binary-armel/Release': healthy
'main/binary-i386': healthy
'main/binary-i386/Packages': healthy
'main/binary-i386/Packages.bz2': healthy
'main/binary-i386/Packages.gz': healthy
'main/binary-i386/Release': healthy
'main/binary-powerpc': healthy
'main/binary-powerpc/Packages': not healthy
 repair successful
'main/binary-powerpc/Packages.bz2': healthy
'main/binary-powerpc/Packages.gz': healthy
'main/binary-powerpc/Release': healthy
'main/source': healthy
'main/source/Release': healthy
'main/source/Sources.gz': healthy
done: 40 objects checked
 pre-repair: 39 healthy, 1 unhealthy
 1 repairs attempted, 1 successful, 0 failed
 post-repair: 40 healthy, 0 unhealthy

comment:4 Changed at 2012-02-14T20:35:44Z by davidsarah

  • Keywords mutable repair error assertion added
  • Milestone changed from undecided to 1.10.0
  • Summary changed from ERROR: AssertionError() (current git) to ERROR: AssertionError(), apparently on mutable publish during check/repair (current git)

comment:5 Changed at 2012-02-17T17:08:07Z by zooko

  • Priority changed from major to critical

comment:6 Changed at 2012-02-18T20:23:54Z by kevan

  • Owner changed from nobody to kevan
  • Status changed from new to assigned

comment:7 Changed at 2012-02-19T03:19:11Z by kevan

If it's convenient, could you attach an incident report corresponding to this bug? I think I know what's causing this -- the Retrieve object doesn't keep track of whether it has successfully fetched, validated, and stored the private key when asked to do so -- but having logs would help to confirm that.

comment:8 Changed at 2012-03-05T22:35:49Z by zooko

  • Keywords regression added
  • Milestone changed from 1.10.0 to 1.9.2

This is a regression in 1.8→1.9, isn't it? I propose we put it into the "1.9.2" Milestone.

comment:9 follow-up: Changed at 2012-04-07T20:48:44Z by warner

hey, is this maybe a dupe of #1689?

comment:10 in reply to: ↑ 9 Changed at 2012-04-08T04:04:10Z by davidsarah

  • Component changed from unknown to code-mutable
  • Owner changed from kevan to killyourtv
  • Status changed from assigned to new

Replying to warner:

hey, is this maybe a dupe of #1689?

The traceback is the same, so probably.

killyourtv, can you reproduce this with trunk?

comment:11 Changed at 2012-04-14T18:52:47Z by killyourtv

Sorry for the delayed response.

Trying with trunk and the patchset from #68 applied (assuming I refactored it properly) yielded the following

'debian/dists/lenny/main/binary-i386/Packages.gz': healthy
'debian/dists/lenny/main/binary-i386/Release': healthy
'debian/dists/lenny/main/source': not healthy
 repair successful
'debian/dists/lenny/main/source/Release': healthy
'debian/dists/lenny/main/source/Sources.gz': healthy
"ERROR: AttributeError('NoneType' object has no attribute 'callRemote')"
"[Failure instance: Traceback: <type 'exceptions.AttributeError'>: 'NoneType' object has no attribute 'callRemote'"
/usr/lib/python2.7/dist-packages/twisted/internet/defer.py:545:_runCallbacks
/usr/lib/python2.7/dist-packages/twisted/internet/defer.py:796:_cbDeferred
/usr/lib/python2.7/dist-packages/twisted/internet/defer.py:362:callback
/usr/lib/python2.7/dist-packages/twisted/internet/defer.py:458:_startRunCallbacks
--- <exception caught here> ---
/usr/lib/python2.7/dist-packages/twisted/internet/defer.py:545:_runCallbacks
$SRC/src/allmydata/mutable/servermap.py:1178:_check_for_done
$SRC/src/allmydata/mutable/servermap.py:1206:_send_more_queries
$SRC/src/allmydata/mutable/servermap.py:580:_do_query
$SRC/src/allmydata/mutable/servermap.py:601:_do_read

I'm currently trying again with just #1007 and #1010 applied.

comment:12 Changed at 2012-04-15T01:13:20Z by killyourtv

I re-refactored the patch from #68 (I think I made a mistake when I did it the first time earlier tonight).

The repair process on my rather large share has been going for a bit over an hour now. It's certainly going farther than it has at any other point with 1.9.x.

I'll post another update once it completes but so far it's looking very good!

comment:13 Changed at 2012-04-15T20:03:45Z by killyourtv

Several hours later

done: 664 objects checked
 pre-repair: 587 healthy, 77 unhealthy
 77 repairs attempted, 77 successful, 0 failed
 post-repair: 664 healthy, 0 unhealthy

So far, so good. I'll stay on trunk to test a bit more.

comment:14 Changed at 2012-04-17T21:41:36Z by killyourtv

  • Cc killyourtv@… added
  • Resolution set to fixed
  • Status changed from new to closed

I think this problem has been resolved. I'll close this ticket. Thanks, all!

Note: See TracTickets for help on using tickets.