[tahoe-dev] Tahoe-LAFS + sshfs latency issues

Sun Jul 24 20:12:42 PDT 2011

Hi again,

Thanks for all the feedback so far, it was really helpful for
debugging and narrowing things down further. Currently I'm a
little stuck with the testing again, so time to share some of the
results I'm currently getting, I think :). I'll be a little more
verbose with things I could get fixed and tried in between than
just sharing the point I'm currently stuck at. Maybe some people
might find the steps in between useful later, too.

The first issue is (as already stated in the irc channel) that
actually scp is not working / is not supposed to work. scp never
gave me any error messages and as I was always having the same
destination for the downloaded file, I was always assuming that
the download via scp would have worked - but in fact it did
nothing at all. However the sftp command worked fine - except
having the same latency issues.

I then further discovered that actually while doing the "tahoe
put", which saturates my hole upload capacity of about
800-900kbit/s, introduced quite a lot of latency; generally on
average 1s, highly variable. So it looked very similar to the
bufferbloat phaenomena [1].

To outrule, that the problem is the interaction of "tahoe get" and
"tahoe put", I've switched to creating the traffic via netcat6
instead of "tahoe get" (nc6 192.168.145.3 12345 < /dev/urandom &).
And I still had the same aweful 20s and higher latency issues
for tahoe-lafs itself.

I've then added some tc/qdisc stuff via wondershaper on my Linux
router, limting the upload rate to 800kbit/s, which helped
partially: The big upload of the VM on a dedicated PC running
tahoe did not have an impact on the latency of other machines
in my network anymore (e.g. the internet latency was normal
again for my laptop). However the latency within the tinc VPN
(that's where tahoe is currently running in; actually it was
running within the tinc VPN layer within the BATMAN mesh layer,
but I removed that mesh layer for testing for now) was still
at the highly variable 1s latency.

To further verify that the tahoe-lafs issue and network latency
issue was somewhere on my local network, I set up wondershaper to
shape to an upload rate of 400kbit/s (and verified via vnstat,
that really just about 400kbit/s get out of my internet ppp
connection). And yes, the 1s latency was still present within the
VM and the tahoe-lafs latency was still 20s+. So I think the issue
should be somewhere at my place, at home, as I'm not saturating
my ISP's routers or anything at or to the storage node on the other
side anymore.

I've first tried also using wondershaper on the VPN interface to
always limit the upload rate to something slightly lower than the
one set on the router's ppp interface to move any bufferering
issues as close to the sender as possible. But I had to discover,
that for one thing wondershaper does not work for IPv6, seemingly
due to a lack of the Linux kernel's tc dealing with anything other
than IPv4 [2] (I was using netcat6 via IPv6 in the beginning). And
for the final use-case the BATMAN packets are neither IPv6 nor IPv4.
And for another thing a fixed upload capacity via wondershaper
would probably not work when I'm simultaneously uploading
something from my laptop, with for instance 200kbit/s.

So I ended up playing with the txqueuelen on the VPN interface
tap0. Which of course is not really ideal either... For 1280 Byte
traffic, a txqueuelen of 2 works great (it limits the throughput
to about ~800kbit/s), but for 700 Byte packets (as they are coming
from the mesh network layer due to internal fragmentation) that only
saturates the link to about 200kbit/s, for 700 Byte packets a
txqueuelen of 4 would be ideal. Anyway, for the tests I was now
further going for a txqueuelen of 2 and will later for the real
setup go for 4 until I find a better solution.

So great, within the VPN I can now successfully upload with
netcat6 at 800-900kbit/s and an ICMP ping over the VPN has a nice
latency of only 70ms. Time to get back to tahoe-lafs:
---
/usr/bin/time -f "%e" sh -c "for i in \`seq 1 10\`; do ~/allmydata-tahoe-1.8.2/bin/tahoe get root:music/8bitpeoples/8BP102.gif /tmp/8BP102.gif 2> /dev/null; done"
-> 10.39!
---
Cool, that finally works nicelly and has an acceptable delay of 1s
per transfer on average. (see [3] -> 03:21:29 - 03:21:38 for details)

So just trying to verify that this also solved the issue for the
parallel 'tahoe put' and 'tahoe get', I'm stopping netcat6 and
start the 'tahoe put' of a 1GB file. I'm again waiting until the
uploading starts and check that the ICMP ping is fine.
Then starting the 'tahoe get' again:
---
/usr/bin/time -f "%e" sh -c "for i in \`seq 1 10\`; do ~/allmydata-tahoe-1.8.2/bin/tahoe get root:music/8bitpeoples/8BP102.gif /tmp/8BP102.gif 2> /dev/null; done"
-> 220.42!?
---

Pfeh, and now it seems to have something to do with 'tahoe put'
and 'tahoe get' again, athough I had thought that I had outruled
it in the beginning... the issue is there again, 22s average
(see [4] -> 03:34:03 - 03:37:25 for details).

Finally, two more tests with a txqueuelen 500 on tap0 instead
of 2, just to show the impact of that again:

With 'tahoe put':
---
/usr/bin/time -f "%e" sh -c "for i in \`seq 1 10\`; do ~/allmydata-tahoe-1.8.2/bin/tahoe get root:music/8bitpeoples/8BP102.gif /tmp/8BP102.gif 2> /dev/null; done"
-> 178.23
---
(not quite sure why it is lower now, maybe some variance, or maybe
the txqueuelen was so low that with multiple streams it had a bad
impact on the performance again, 18s average;
see [5] -> 03:50:57 - 03:53:38 for details)

With netcat6:
---
/usr/bin/time -f "%e" sh -c "for i in \`seq 1 10\`; do ~/allmydata-tahoe-1.8.2/bin/tahoe get root:music/8bitpeoples/8BP102.gif /tmp/8BP102.gif 2> /dev/null; done"
-> 51.28!?
---
5s average, 'tahoe put' definitely seems to have something to do
with the issue, as with using netcat6 it's not as bad as with
'tahoe put' again.
(see [6] -> 03:59:34 - 04:00:19 for details)

Can anyone make any sense from these test results?

Cheers, Linus

PS: Just for the record, I'm using tinc with the IFFOneQueue
option.

[1]:
http://gettys.wordpress.com/2010/12/03/introducing-the-criminal-mastermind-bufferbloat/
http://www.bufferbloat.net/
[2]:
http://lartc.org/howto/lartc.adv-filter.ipv6.html

[3]: http://x-realis.dyndns.org/tahoe/logs/round1
[4]: http://x-realis.dyndns.org/tahoe/logs/round2
[5]: http://x-realis.dyndns.org/tahoe/logs/round3
[6]: http://x-realis.dyndns.org/tahoe/logs/round4