Opened at 2021-09-02T19:28:19Z
Last modified at 2022-11-28T16:15:33Z
#3787 new task
Is the use of Pipeline for write actually necessary? — at Version 4
Reported by: | itamarst | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | HTTP Storage Protocol v2 |
Component: | unknown | Version: | n/a |
Keywords: | Cc: | ||
Launchpad Bug: |
Description (last modified by itamarst)
Updated issue description: there is a single hardcoded value for batching (formerly known as pipelining) immutable uploads, and it might be better to be dynamic. Or higher, at least.
Initial issue description:
Pipeline class was added in #392, but I really don't understand the reasoning.
It makes a bit more sense if you replace the word "pipeline" with "batcher" when reading the code, but I still don't understand why round-trip-time is improved by this approach.
Change History (4)
comment:1 Changed at 2021-10-04T13:27:33Z by itamarst
comment:2 Changed at 2021-10-04T14:01:02Z by itamarst
From the above we can extract two problems:
- A need for backpressure.
- _write() waiting for Deferred to fire before continuing. If the need for backpressure didn't exist, this would be bad. Given backpressure is necessary... this might be OK. Or not, perhaps there is a better mechanism.
So step 1 is probably figure out how to implement backpressure.
comment:3 Changed at 2021-10-04T14:13:42Z by itamarst
Instead of hardcoding buffer size, we could...
- Figure out latency by sending HTTP echo to server.
- Start with some reasonable batch buffer size.
- Keep increasing buffer size until the latency from sending a batch is higher than minimal expected latency from step 1. This implies that we've hit the bandwidth limit.
comment:4 Changed at 2022-11-23T15:19:43Z by itamarst
- Description modified (diff)
- Milestone changed from HTTP Storage Protocol to HTTP Storage Protocol v2
Once #3939 is fixed, the Pipeline class will no longer be used. However, there will still be a batching mechanism via allmydata.immutable.layout._WriteBuffer, which suffers from basically the same issue of having a single hardcoded number that isn't necessarily adapted to network conditions.
So this still should be thought about based on discussion above, but changing the summary and description.
Brian provided this highly detailed explanation: