Opened at 2007-12-29T06:17:15Z
Closed at 2009-04-08T02:18:08Z
#250 closed defect (fixed)
memcheck-64 fails sporadically
Reported by: | zooko | Owned by: | warner |
---|---|---|---|
Priority: | major | Milestone: | 1.4.1 |
Component: | operational | Version: | 0.7.0 |
Keywords: | Cc: | ||
Launchpad Bug: |
Description
Brian knows a little bit more about this. There's some sort of race condition in shutting down old test runs and starting new ones, or something like that.
Change History (8)
comment:1 Changed at 2008-05-09T01:22:00Z by warner
comment:2 Changed at 2008-05-31T01:21:24Z by zooko
- Resolution set to fixed
- Status changed from new to closed
I think this has been fixed.
comment:3 Changed at 2008-05-31T01:21:29Z by zooko
- Milestone changed from undecided to 1.1.0
comment:4 Changed at 2008-07-14T16:09:21Z by zooko
- Resolution fixed deleted
- Status changed from closed to reopened
This just happened on a different builder:
http://allmydata.org/buildbot/builders/feisty2.5/builds/1557/steps/test/logs/stdio
allmydata.test.test_client.Run.test_reloadable ... Traceback (most recent call last): File "/home/buildslave/tahoe/feisty2.5/build/src/allmydata/test/test_client.py", line 194, in _restart c2.setServiceParent(self.sparent) File "/usr/lib/python2.5/site-packages/twisted/application/service.py", line 148, in setServiceParent self.parent.addService(self) File "/usr/lib/python2.5/site-packages/twisted/application/service.py", line 259, in addService service.privilegedStartService() File "/usr/lib/python2.5/site-packages/twisted/application/service.py", line 228, in privilegedStartService service.privilegedStartService() File "/usr/lib/python2.5/site-packages/twisted/application/service.py", line 228, in privilegedStartService service.privilegedStartService() File "/usr/lib/python2.5/site-packages/twisted/application/internet.py", line 68, in privilegedStartService self._port = self._getPort() File "/usr/lib/python2.5/site-packages/twisted/application/internet.py", line 86, in _getPort return getattr(reactor, 'listen'+self.method)(*self.args, **self.kwargs) File "/usr/lib/python2.5/site-packages/twisted/internet/posixbase.py", line 467, in listenTCP p.startListening() File "/usr/lib/python2.5/site-packages/twisted/internet/tcp.py", line 733, in startListening raise CannotListenError, (self.interface, self.port, le) twisted.internet.error.CannotListenError: Couldn't listen on any:43755: (98, 'Address already in use').
Is it possible that this fault happens whenever the same port number is chosen at random by two successive tests?
comment:5 Changed at 2008-07-14T19:12:47Z by warner
There are comments in the test that have more detail.. the issue is some absolute timeouts that were not easy to get rid of. The problem is most likely the old instance not completely shutting down before the new one is started up.
source:/src/allmydata/test/test_client.py@2712#L166 has details:
def test_reloadable(self): basedir = "test_client.Run.test_reloadable" os.mkdir(basedir) dummy = "pb://wl74cyahejagspqgy4x5ukrvfnevlknt@127.0.0.1:58889/bogus" open(os.path.join(basedir, "introducer.furl"), "w").write(dummy) c1 = client.Client(basedir) c1.setServiceParent(self.sparent) # delay to let the service start up completely. I'm not entirely sure # this is necessary. d = self.stall(delay=2.0) d.addCallback(lambda res: c1.disownServiceParent()) # the cygwin buildslave seems to need more time to let the old # service completely shut down. When delay=0.1, I saw this test fail, # probably due to the logport trying to reclaim the old socket # number. This suggests that either we're dropping a Deferred # somewhere in the shutdown sequence, or that cygwin is just cranky. d.addCallback(self.stall, delay=2.0) def _restart(res): # TODO: pause for slightly over one second, to let # Client._check_hotline poll the file once. That will exercise # another few lines. Then add another test in which we don't # update the file at all, and watch to see the node shutdown. (to # do this, use a modified node which overrides Node.shutdown(), # also change _check_hotline to use it instead of a raw # reactor.stop, also instrument the shutdown event in an # attribute that we can check) c2 = client.Client(basedir) c2.setServiceParent(self.sparent) return c2.disownServiceParent() d.addCallback(_restart) return d
comment:6 Changed at 2008-07-14T22:30:58Z by warner
- Milestone changed from 1.1.0 to 1.1.1
comment:7 Changed at 2009-03-28T19:43:06Z by zooko
Hm.. This ticket was last touched 9 months ago. I haven't been seeing this failure in practice recently, as far as I recall. Close this as fixed?
comment:8 Changed at 2009-04-08T02:18:08Z by warner
- Resolution set to fixed
- Status changed from reopened to closed
I don't remember seeing this failure for a while either. I think it's safe to close.. feel free to reopen if it appears again.
we fixed one possible source of failures: the pre-determined webport. This should fix failures that say "address already in use" in the nodelog. Let's watch the buildbot and see if any new failures show up.