Opened at 2022-01-28T16:10:58Z
Last modified at 2022-01-31T13:44:10Z
#3869 new defect
Intermittent allmydata.test.test_storage_http.GenericHTTPAPITests.test_bad_authentication failure
Reported by: | exarkun | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | undecided |
Component: | unknown | Version: | n/a |
Keywords: | Cc: | ||
Launchpad Bug: |
Description (last modified by exarkun)
Sometimes when running the full test suite, especially with high test concurrency (-j8 or higher), test_bad_authentication fails like this:
[ERROR] Traceback (most recent call last): Failure: testtools.testresult.real._StringException: Empty attachments: twisted-log Traceback (most recent call last): File "/nix/store/23igmvfrawyi9hzlhhx3sja6jzdxwwgq-python3-3.7.11-env/lib/python3.7/site-packages/testtools/twistedsupport/_runtest.py", line 386, in _log_user_exception raise e testtools.twistedsupport._runtest.UncleanReactorError: The reactor still thinks it needs to do things. Close all connections, kill all processes and make sure all delayed calls have either fired or been cancelled: <DelayedCall 0x7f339af67710 [-0.39075684547424316s] called=0 cancelled=1> allmydata.test.test_storage_http.GenericHTTPAPITests.test_bad_authentication
Change History (3)
comment:1 Changed at 2022-01-28T16:11:10Z by exarkun
- Description modified (diff)
comment:2 Changed at 2022-01-28T18:16:09Z by exarkun
comment:3 Changed at 2022-01-31T13:44:10Z by exarkun
Some observations:
- These seem to happen on two newly configured CI jobs (the new NixOS jobs that replaced the old ones)
- These seem to happen when concurrency is high (the old CI jobs limited trial to 3 workers, the new CI jobs limit trial to 8 workers)
I tried to investigate on Friday but I ran into a lot of bugs and missing features in trial's concurrent runner ("disttrial") feature that sucked up all the time I put in, as well as other random unrelated-but-blocking problems in Twisted's test suite.
It would be great to be able to reproduce the problem off of CI. In principle this should be doable since CI runs in a Docker image and uses reproducible-build Nix expressions. In practice maybe the problem depends on timing that comes from the particular hardware or load on the real CI runner environment ...
I think that's worth trying, at least. Failing that, we could try just cranking concurrency down on these jobs (back to 3, I guess) and see if that helps.
Also observed from allmydata.test.test_storage_http.GenericHTTPAPITests.test_version