#1381 closed defect

EINTR from communication with subprocess in allmydata/util/iputil.py _query — at Version 8

Reported by: davidsarah Owned by: davidsarah
Priority: major Milestone: 1.10.1
Component: code-network Version: 1.8.2
Keywords: iputil heisenbug review-needed Cc:
Launchpad Bug:

Description (last modified by zooko)

Reported by 'sickness' on irc:

#   Run
#     test_loadable ...                                                      [OK]
#     test_reloadable ... Node._startService failed, aborting
# [Failure instance: Traceback: <type 'exceptions.OSError'>: [Errno 4] Interrupted system call
# /usr/lib/python2.6/threading.py:497:__bootstrap
# /usr/lib/python2.6/threading.py:525:__bootstrap_inner
# /usr/lib/python2.6/threading.py:477:run
# --- <exception caught here> ---
# /usr/lib/python2.6/vendor-packages/twisted/python/threadpool.py:210:_worker
# /usr/lib/python2.6/vendor-packages/twisted/python/context.py:59:callWithContext
# /usr/lib/python2.6/vendor-packages/twisted/python/context.py:37:callWithContext
# /home/righdieg/allmydata-tahoe-1.8.2/src/allmydata/util/iputil.py:222:_synchronously_find_addresses_via_config
# /home/righdieg/allmydata-tahoe-1.8.2/src/allmydata/util/iputil.py:237:_query
# /usr/lib/python2.6/subprocess.py:689:communicate
# /usr/lib/python2.6/subprocess.py:1233:_communicate
# /usr/lib/python2.6/subprocess.py:1157:wait
# ]
# calling os.abort()

Possibly related: http://bugs.python.org/issue1068268 . It may be that the patch for that bug wasn't complete enough. EINTR failures are usually not very reproducible, but the fix is just to repeat the query until it works (or fails with a different error).

Change History (8)

comment:1 follow-up: Changed at 2011-03-22T21:28:16Z by sickness

The OS is opensolaris snv134 64bit

$ uname -a

SunOS MYWORKPC 5.11 snv_134 i86pc i386 i86pc Solaris

$ psrinfo -pv

The physical processor has 2 virtual processors (0 1)

x86 (GenuineIntel? 1067A family 6 model 23 step 10 clock 2800 MHz)

Pentium(r) Dual-Core CPU E6300 @ 2.80GHz

$ isainfo -x

amd64: ssse3 cx16 mon sse3 sse2 sse fxsr mmx cmov amd_sysc cx8 tsc fpu

i386: ssse3 ahf cx16 mon sse3 sse2 sse fxsr mmx cmov sep cx8 tsc fpu

This is instead the tahoe version:

$ allmydata-tahoe-1.8.2/bin/tahoe --version

allmydata-tahoe: 1.8.2,

foolscap: 0.6.1,

pycryptopp: 0.5.29,

zfec: 1.4.22,

Twisted: 8.2.0,

Nevow: 0.10.0,

zope.interface: unknown,

python: 2.6.4,

platform: SunOS-5.11-i86pc-i386-32bit-ELF,

pyOpenSSL: 0.11,

simplejson: 2.0.9,

pycrypto: 2.3,

pyasn1: unknown,

mock: 0.7.0,

sqlite3: 2.4.1 [sqlite 3.6.17],

setuptools: 0.6c16dev3

comment:2 in reply to: ↑ 1 Changed at 2011-03-23T01:26:42Z by davidsarah

Replying to sickness:

python: 2.6.4,

Hmm, that should have had the backported fix for http://bugs.python.org/issue1068268 . Oh well, we would need to work around it for earlier Pythons anyway.

comment:3 Changed at 2011-05-28T22:09:17Z by davidsarah

  • Keywords heisenbug added

comment:4 follow-up: Changed at 2011-05-29T04:32:59Z by zooko

Should we work-around this by catching OSError with errno==4 and retrying the subprocess?

comment:5 in reply to: ↑ 4 Changed at 2011-05-29T15:33:32Z by davidsarah

Replying to zooko:

Should we work-around this by catching OSError with errno==4 and retrying the subprocess?

Yes, I believe so. We probably shouldn't retry forever, so let's retry 10 times. The try/except should cover lines 236 and 237 of iputil.py.

BTW, rather than 4 we should use errno.EINTR (I think this is defined on all platforms, even though EINTR is only really relevant on Unix).

Should _query return [] (i.e. no addresses) if the subprocess fails? Oh, I see that issue is #854 ('what to do when you can't find any IP address for yourself').

comment:6 Changed at 2011-08-14T00:09:40Z by davidsarah

  • Milestone changed from 1.9.0 to 1.10.0

comment:7 Changed at 2011-08-14T00:09:58Z by davidsarah

  • Status changed from new to assigned

comment:8 Changed at 2013-05-27T17:29:35Z by zooko

  • Description modified (diff)

See #1988

Note: See TracTickets for help on using tickets.