Changes between Initial Version and Version 1 of Ticket #2787, comment 1


Ignore:
Timestamp:
2018-05-23T14:14:27Z (6 years ago)
Author:
exarkun
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Ticket #2787, comment 1

    initial v1  
    11It's not possible to fix this inside `allocate_tcp_port` itself.  So I'm planning to close this ticket.  Instead, we'll have a ticket for each test which can fail this way and they'll have to be fixed one by one.
    22
    3 The reason we cannot fix this inside `allocate_tcp_port` is that the approach it is a component of is suffers from an unavoidable race condition.  `allocate_tcp_port` tries to figure out a specific TCP port number which _will not be in use at a later point in time_.  Since there is no part of the system which allows the port number to be reserved or otherwise kept out of us *except by the one piece of code we intend* it cannot actually know whether any port number it selects will satisfy this requirement.
     3The reason we cannot fix this inside `allocate_tcp_port` is that the approach it is a component of suffers from an unavoidable race condition.  `allocate_tcp_port` tries to figure out a specific TCP port number which _will not be in use at a later point in time_.  Since there is no part of the system which allows the port number to be reserved or otherwise kept out of us *except by the one piece of code we intend* it cannot actually know whether any port number it selects will satisfy this requirement.
    44
    55In practice, it does succeed with high probability.  However, due to the large number of cases in which it is used (many times per test suite run and the test suite itself is run many times), even this high probability of success is not good enough.  I will make an incredibly naive estimate that there are 2^15^ ports available for "random" assignment and that the chance of an unrelated intermediate assignment being made is about 1 in 2 (I suspect some tests themselves trigger an unrelated intermediate port assignment).  The chance of a collision is therefore 1 in 2^16^ (around a thousandth of a percent).  If there are 100 users of `allocate_tcp_port` in the test suite then the chance of a collision anywhere in the test suite is 100 in 2^16^.  There are about 15 different CI runners of the test suite.  So the chance of a failure on any of them for one build set is 15 * 100 in 2^16^.  The test suite is run for every pull request and every master revision.  If there is one PR merged a day, the chance of a failure in a week is at least 14 * 15 * 100 in 2^16^ which reduces to around 32%.  Quite easily high enough to be disruptive to development.