Opened at 2008-04-15T01:30:24Z
Closed at 2008-05-08T18:07:50Z
#384 closed enhancement (fixed)
t=deep-size needs rate-limiting
Reported by: | warner | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | 1.1.0 |
Component: | code-performance | Version: | 1.0.0 |
Keywords: | web | Cc: | |
Launchpad Bug: |
Description
The webapi "?t=deep-size" feature (as well as the t=manifest feature from which it is derived) needs to be rate-limited. I saw the prodnet webapi machine fire off about 300 directory retrievals in a single tick, which is enough of a load spike to stall the node for a few dozen seconds.
It might be useful to rebuild something like the old slowjob, but in a form that's easier to use this time around. Maybe an object which accepts a (callable, args, kwargs) tuple, and returns a Deferred that fires with the results. The call is not invoked until later, however, and the object has a limit on the number of simultaneous requests that will be outstanding, or perhaps a maximum rate at which requests will be released.
Change History (2)
comment:1 Changed at 2008-04-16T00:23:09Z by warner
comment:2 Changed at 2008-05-08T18:07:50Z by warner
- Milestone changed from undecided to 1.1.0
- Resolution set to fixed
- Status changed from new to closed
I implemented this, in 3cb361e233054121. I did some experiments to decide upon a reasonable value for the default limit, and settled upon allowing 10 simultaneous requests per call to deep-size.
From my desktop machine (fluxx, Athlon 64 3500+ in 32bit mode), which has a pretty fast pipe to the storage servers in our colo, t=deep-size on a rather large directory tree (~1700 directories, including one that has at least 300 children) takes:
- limit=2: 2m25s (13 directories per second)
- limit=5: 2m15s (14.7 dps)
- limit=10: 2m10s/2m13s/2m14s (15 dps)
- limit=30: 2m13s/2m14s (15 dps)
- limit=60: 2m13s (15 dps)
- limit=120: 2m12s (15.7 dps)
- limit=9999: 2m06s (16.6 dps)
The same test done from a machine in colo (tahoecs2, P4 3.4GHz), which probably gets lower latency to the storage servers but might have a slower CPU, gets:
- limit=2: 2m35s/2m32s peak memory 67MB vmsize / 42MB rss
- iimit=10: 2m37s/2m29s 68MB vmsize / 43MB rss
- limit=9999: 2m28s/2m52s 122MB vmsize / 100MB rss
So increasing the concurrency limit causes:
- marginal speedups in retrieval time (<25%), probably because it's filling the pipe better
- significant increases in memory (2x), because there are lots of dirnode retrivals happening at the same time
Therefore I think limit=10 is a reasonable choice.
It is useful to note that the CPU was pegged at 100% for all trials. The current bottleneck is in the CPU, not the network. I suspect that the mostly-python unpacking of dirnodes is taking up most of the CPU.
Mike says that he saw similar problems on the windows client, before changing it to offload the t=deep-size queries to the prodnet webapi server. The trouble is, that machine gets overloaded by it too. So managing the parallelism would help both issues.
He saw a request use 50% of the local CPU for about 60 seconds. The same deep-size request took about four minutes when using a remote server, if I'm understanding his message correctly.
One important point to take away is that deep-size should not be called on every modification.. we should really be caching the size of filesystem and applying deltas as we add and remove files, then only doing a full deep-size every once in a while (maybe once a day) to correct the value.