Opened at 2022-11-03T17:17:46Z
Closed at 2022-12-05T19:06:01Z
#3939 closed defect (fixed)
HTTP protocol is significantly slower than Foolscap protocol
Reported by: | itamarst | Owned by: | GitHub <noreply@…> |
---|---|---|---|
Priority: | normal | Milestone: | HTTP Storage Protocol |
Component: | unknown | Version: | n/a |
Keywords: | Cc: | ||
Launchpad Bug: |
Description
Specifically, in https://github.com/tahoe-lafs/tahoe-lafs/pull/1225, test_filesystem (from test_system.py) is 40 seconds with HTTP, compared to 20 with Foolscap.
Of those extra 20 seconds:
- 7 seconds are from repeated calls to get_spki_hash() in the HTTP client validation code. Or rather, the _real_ bottleneck is the call to get_public_bytes(), which is a little surprising but OK.
- Another 7 seconds or so are just the TLS handshake?!
- http_client's getContext() method to create a TLS context is also quite expensive, and there's some expense in creating context objects on the server.
The test setup doesn't use persistent connections, so this arguably unrealistic, and maybe tests should use persistent connections. This still seems... excessive. Perhaps we're doing way too many HTTP requests. E.g. maybe chunk sizes that are OK for Foolscap are too small for HTTP.
Change History (7)
comment:1 Changed at 2022-11-03T17:19:34Z by itamarst
comment:2 Changed at 2022-11-22T18:09:49Z by itamarst
Looking at an immutable, here are some writes for a (presumably small) object, as recorded via layout.py: 36 bytes, 1400, 32, 32, 32, 170, 320. This will get batched via pipeline.py (see #3787) and for Foolscap maybe that's fine. But HTTP/1.1 has higher overhead per query than Foolscap I suspect, and even with HTTP 2.0 I imagine it's rather higher.
So possibly one strategy is doing an alternative to pipeline.py where logic isn't generic "batch API queries" but instead it relies on the fact we're doing writes without any holes, so we can coalesce writes semantically. (Previously it _would_ have holes so this would've been harder.)
comment:3 Changed at 2022-11-22T18:14:31Z by itamarst
There is the problem that the API currently doesn't force the code to do writes _in order_, but that's solvable.
comment:4 Changed at 2022-11-22T18:17:44Z by itamarst
In practice immutables path does actually have the writes happening in the correct order, so possibly that should just work.
comment:5 Changed at 2022-11-22T20:23:41Z by itamarst
TODO for branch-in-progress:
- Unit tests for new data structure
- Validate there is still backpressure
- Update #3787 to note new place that needs tuning
- See if equivalent improvements are possible for mutables
comment:6 Changed at 2022-11-23T15:50:51Z by itamarst
Did all the above (mutable uploads seem more reasonable apriori? they at least don't do writes per tiny bit of metadata). Some quantitative results: for allmydata.test.test_system.HTTPSystemTest.test_filesystem, number of writes during immutable upload goes from 530 to 60. So that's good! It does not however make a meaningful dent in run time... so there are likely other overly chatty interactions.
Initial thought is that downloads are maybe using too small of a chunk size, so will investigate that next.
comment:7 Changed at 2022-12-05T19:06:01Z by GitHub <noreply@…>
- Owner set to GitHub <noreply@…>
- Resolution set to fixed
- Status changed from new to closed
In 1eba202c/trunk:
If I switch back to persistent HTTP connections, the test takes 20 seconds. So this is perhaps not a blocker if I can figure out how to make the tests not get dirty reactor with persistent HTTP.
Still seems worth fixing though, it suggests the HTTP protocol is doing way too much requests.