Opened at 2008-02-27T20:16:01Z
Closed at 2008-03-13T03:43:51Z
#321 closed defect (fixed)
poor performance with large number of files via windows FUSE?
Reported by: | warner | Owned by: | booker |
---|---|---|---|
Priority: | major | Milestone: | 0.9.0 (Allmydata 3.0 final) |
Component: | code-frontend | Version: | 0.8.0 |
Keywords: | fuse performance | Cc: | |
Launchpad Bug: |
Description
Peter and Fabrice have reported problems with dragging a large folder into the windows FUSE frontend. We're still collecting data, but the implication is that there is a super-linear slowdown somewhere, maybe in the FUSE plugin, maybe in the local Tahoe node that it connects to. We expect to spend roughly one second per file right now: our automated perfnet tests show 600ms per immutable file upload and 300ms per directory update; prodnet has a different number of servers but I'd expect the values to be fairly close. Peter says that this is not sufficient to explain the slowdowns.
We are currently running tests with additional instrumentation to figure out where this time is being spent.
Attachments (1)
Change History (10)
comment:1 Changed at 2008-02-28T01:59:59Z by zooko
comment:2 Changed at 2008-02-28T02:00:24Z by zooko
Err, I mean #173 -- "How does tahoe handle lots of simultaneous file-upload tasks?".
comment:3 Changed at 2008-02-28T19:47:53Z by warner
unfortunately no.. the FUSE plugin is only giving one task to the tahoe node at a time. No parallelism here.
comment:4 Changed at 2008-02-28T20:10:25Z by zooko
Fine then -- let us add an automated performance measurement that says "How deos tahoe handle lots of sequential file-upload tasks?".
comment:5 Changed at 2008-02-28T22:26:22Z by zooko
#327 -- "performance measurement of directories"
comment:6 Changed at 2008-02-29T04:07:10Z by warner
We've performed some log analysis, and identified that the problem is simply the dirnodes becoming too large. A directory with 353 children consumes 114305 bytes, and at 3-of-10 encoding, requires about 400kB to be written on each update. A 1MBps SDSL line can do about 100kBps, so this takes about 4 seconds to send out all the shares. The Retrieve that precedes the Publish takes a third of this time, so it needs 1 or 2 seconds. The total time to update a dirnode of this size is about 10 seconds. Small directories take about 2 seconds.
One thing that surprised me was that dirnodes are twice as large as I'd thought: 324 bytes per child. I guess my previous estimates (of 100-150) were based on design that we haven't yet implemented, in which we store binary child caps instead of ASCII ones. So the contents of dirnode are large enough to take a non-trivial amount of time to upload. Also note that this means our 1MB limit on SMDF files imposes a roughly 3000-child limit on dirnodes (but this could be easily raised by allowing larger segments).
There are four things we can do about this.
- The most significant is to do fewer dirnode updates. A FUSE plugin (with a POSIX-like API) doesn't give us any advance notice of how many child entries are going to be added, so the best we can do is a Nagle-like algorithm that tries to batch writes together for efficiency. The basic idea is that when a dirnode update request comes in, start a timer (perhaps five seconds). Merge in any other update requests that arrive during that time. When the timer expires, do the actual update. This will help the lots-of-small-files case as long as the files are fairly small and upload quickly. In the test we ran (with 1024 byte files), this would probably have reduced the number of dirnode updates by a factor of 5.
The biggest problem is that this can't be done completely safely: it requires lying to the close() call and pretending that the child has been added when it actually hasn't. We could recover some safety by adding a flush() or sync() call of some sort, and not returning from it until all the nagle timers have been accelerated and finished.
- Make dirnodes smaller. DSA-mutable files (#217) and packing binary caps into dirnodes (no ticket yet) would cut the per-child size in half (assuming I'm remembering my numbers correctly). Once dirnodes get large enough to exceed the size of the overhead (2kB overhead, so roughly 6 entries), this will cut about 50% off the large dirnode update time.
- We discovered an the unnecessary retrieve during the directory update process. We need to update the API (#328) to remove this and provide the safe-update semantics that were intended. Fixing this would shave about 10%-15% off the time needed to do a dirnode update (both large and small).
- Serializing the directory contents (including encrypting the writecaps) took 500ms for 353 entries. The dirnode could cache and reuse the encrypted strings instead of generating new ones each time. This might save about 5% of the large-dirnode update time. Ticket #329 describes this.
Zooko has started work on reducing the dirnode updates, by adding an HTTP interface to IDirectoryNode.set_uris() (allowing the HTTP client to add multiple children at once). Mike is going make the winFUSE plugin split the upload process into separate upload-file-get-URI and dirnode-add-child phases, which will make it possible for him to implement the Nagle-like timer and batch the updates.
comment:7 Changed at 2008-02-29T04:13:43Z by warner
Oh, we also noticed a large number of t=json queries being submitted by the winFUSE plugin. At the beginning of the test (when the directory only had a few entries, and updates took about 3 seconds), we were seeing about 5 such queries per child entry. All of these queries require a directory fetch, and most resulted in a 404 because the target filename wasn't present in the directory. When dirnode updates started taking longer (10 seconds), we saw fewer of these per update (maybe 1).
Early in the test, these queries took 210ms each. At the end of the test they take one or two seconds each. This might represent 15%-30% of the time spent doing the dirnode updates.
The plugin should do fewer of these queries: they are consuming network bandwidth and slowing down the directory update. If it is doing them to see if the file has been added to the directory yet, then it would be far more efficient to simply wait for the response to the PUT call. If they are being done for some other reason, then we should consider some sort of read cache to reduce their impact.
comment:8 Changed at 2008-03-10T19:42:17Z by zooko
- Owner set to booker
MikeB: is this issue handled Well Enough for v0.9.0 now?
comment:9 Changed at 2008-03-13T03:43:51Z by zooko
- Resolution set to fixed
- Status changed from new to closed
This issue is somewhat improved, and is hereby considered Good Enough for allmydata.org "Tahoe" v0.9.0.
(Further performance tuning might be applied before the Allmydata.com 3.0 product release, but that can be done after the allmydata.org "Tahoe" v0.9.0 release.)
It's too bad we didn't implement #273 -- "How does tahoe handle lots of simultaneous file-upload tasks?" -- before now. If we had, then we would know already how the Tahoe node itself handles this load.