Opened at 2012-11-10T01:45:55Z
Closed at 2020-01-13T20:26:42Z
#1852 closed defect (wontfix)
test suite: find a way to prevent unclean reactor errors causing subsequent test failures
Reported by: | davidsarah | Owned by: | daira |
---|---|---|---|
Priority: | normal | Milestone: | soon |
Component: | code | Version: | 1.9.2 |
Keywords: | test twisted-trial unclean reactor timeout hang | Cc: | zooko |
Launchpad Bug: |
Description (last modified by zooko)
For example, in the 666-accounting branch, if test_mutable.Filenode.test_modify_backoffer is reenabled, it will fail with an unclean reactor error and subsequent tests will time out.
Probably this is due to some non-obvious sharing of global state between tests (it doesn't always happen that an unclean reactor error causes subsequent timeouts).
In any case, it makes it difficult to run the test suite the whole way through when some tests cause this problem. One rather brute-force solution would be to exit on an unclean reactor error and restart the test process from the next test. I can't see a way to do that without forking trial, though. Another possibility is to try to use twisted.test.proto_helpers.MemoryReactor.
Change History (17)
comment:1 Changed at 2012-11-10T01:52:57Z by davidsarah
comment:2 Changed at 2012-11-10T02:36:53Z by davidsarah
Yes, it is:
import time from twisted.internet import reactor from allmydata.util import log from allmydata.util.pollmixin import PollMixin class WaitForDelayedCallsMixin(PollMixin): def _delayed_calls_done(self): # We're done when the only remaining DelayedCalls fire after threshold. # (These will be associated with the test timeout.) threshold = time.time() + 60 for delayed in reactor._pendingTimedCalls + reactor._newTimedCalls: if delayed.getTime() < threshold: return False return True def wait_for_delayed_calls(self, res): d = self.poll(self._delayed_calls_done) d.addErrback(log.err, "error while waiting for delayed calls") d.addBoth(lambda ign: res) return d
comment:3 Changed at 2012-11-11T14:54:44Z by davidsarah
A variation of the above code is now in allmydata.util.deferredutil on my 666-accounting branch. It works well to avoid some spurious unclean reactor errors in tests that are known to fail for that reason, but it doesn't solve the general problem that such errors cause timeouts in later tests.
comment:4 Changed at 2012-11-22T23:25:02Z by zooko
See also clean_pending(), which I think accomplishes a similar result without the delay. I've used it successfully in the past, but not recently.
comment:5 Changed at 2012-11-22T23:29:23Z by zooko
We should figure out whether WaitForDelayedCallsMixin and pyutil's clean_pending() are each sufficient to solve our problem with errors in unit tests causing later unit tests to spurious fail-or-error, and if so which approach is better, and we should deprecate the other one.
comment:6 Changed at 2012-11-22T23:29:42Z by zooko
- Cc zooko added
comment:7 follow-up: ↓ 8 Changed at 2012-11-22T23:43:25Z by davidsarah
Ah, I did not know about the reactor.getDelayedCalls method used by clean_pending. I think clean_pending does not quite do the right thing, though:
- I want the delay, because that's more likely to avoid problems with subsequent tests. Cancelling the delayed calls might still leave other reactor resources unclean.
- I think clean_pending will cancel the Deferred that implements the test timeout, or any others that are internal to Twisted Trial. (There weren't any others when I tested, but that's an implementation detail.)
I'll see whether reactor.getDelatedCalls can be used to replace the use of internal attributes of reactor in WaitForDelayedCalls.
comment:8 in reply to: ↑ 7 Changed at 2012-11-22T23:56:58Z by davidsarah
Replying to davidsarah:
I'll see whether reactor.getDelayedCalls can be used to replace the use of internal attributes of reactor in WaitForDelayedCalls.
It can; in fact its implementation does exactly the right thing:
def getDelayedCalls(self): """Return all the outstanding delayed calls in the system. They are returned in no particular order. This method is not efficient -- it is really only meant for test cases.""" return [x for x in (self._pendingTimedCalls + self._newTimedCalls) if not x.cancelled]
comment:9 Changed at 2012-11-23T00:59:14Z by davidsarah
https://github.com/daira/tahoe-lafs/commit/7671b3e33f0c91e764fb869175ff85f0389c38d1 now uses getDelayedCalls.
comment:10 follow-up: ↓ 14 Changed at 2012-12-02T01:36:19Z by davidsarah
clean_pending is not just in pyutil; it's also in src/allmydata/test/common_util.py's TestMixin. So we should probably merge WaitForDelayedCalls into that, but TextMixin also contains code I don't understand relating to repeatable_random, so I'm not going to attempt that merge now.
Assigning to zooko to explain what TestMixin.setUp is trying to do (I can't decipher the comment, which appears to be missing some words).
comment:11 Changed at 2012-12-02T01:37:19Z by davidsarah
- Milestone changed from undecided to 1.11.0
- Owner changed from davidsarah to zooko
comment:12 Changed at 2012-12-02T01:41:39Z by davidsarah
Note that TestMixin and WaitForDelayedCalls also differ in that inheriting TestMixin causes all test methods in the class to cancel delayed calls, whereas WaitForDelayedCalls only waits in methods that use d.addBoth(self.wait_for_delayed_calls).
comment:13 Changed at 2014-06-25T19:46:43Z by zooko
- Description modified (diff)
comment:14 in reply to: ↑ 10 ; follow-up: ↓ 16 Changed at 2014-06-25T19:47:16Z by zooko
Replying to davidsarah:
Assigning to zooko to explain what TestMixin.setUp is trying to do (I can't decipher the comment, which appears to be missing some words).
daira: the other thing the TestMixin does replaces os.urandom with a deterministic RNG in order to make tests more reproducible. It isn't currently used, so let's remove that functionality.
Also, yes, the doc is missing a word.
comment:15 Changed at 2014-06-25T19:47:23Z by zooko
- Owner changed from zooko to daira
comment:16 in reply to: ↑ 14 Changed at 2014-07-24T00:18:59Z by daira
comment:17 Changed at 2020-01-13T20:26:42Z by exarkun
- Resolution set to wontfix
- Status changed from new to closed
Instead, fix tests that fail with UncleanReactorError?. And stop using trial's TestCase? and use testools' instead.
It seems like the unclean reactor errors in our test suite are most often due to DelayedCalls. I wonder whether it's possible to automatically wait (until a timeout) for all DelayedCalls to have completed.