[tahoe-dev] Tahoe-LAFS + latency issues

Tue Jul 26 00:46:08 PDT 2011

Just to let you know, I found an easier and cleaner way to
mitigate the bufferbloating: I'm now using TCP Vegas instead of
TCP Cubic. TCP Vegas works astonishingly well, latencies lower
than 100ms with full upload capacity usage (I'll switch all my
devices to TCP Vegas now, I think :D)

modprobe tcp_vegas; echo vegas > /proc/sys/net/ipv4/tcp_congestion_control

So I'm quite sure now, that the issue I'm still having with
tahoe-lafs should not be induced by general network latency
issues.

The results with TCP Vegas instead of TCP Cubic + txqueuelen
hackery are still pretty much the same though.

netcat6 upload + 'tahoe get' ~1s per file, 'tahoe put' + 'tahoe
get' ~20s per file.

Cheers, Linus

On Mon, Jul 25, 2011 at 05:12:42AM +0200, Linus Lüssing wrote:
> Hi again,
> 
> Thanks for all the feedback so far, it was really helpful for
> debugging and narrowing things down further. Currently I'm a
> little stuck with the testing again, so time to share some of the
> results I'm currently getting, I think :). I'll be a little more
> verbose with things I could get fixed and tried in between than
> just sharing the point I'm currently stuck at. Maybe some people
> might find the steps in between useful later, too.
> 
> 
> The first issue is (as already stated in the irc channel) that
> actually scp is not working / is not supposed to work. scp never
> gave me any error messages and as I was always having the same
> destination for the downloaded file, I was always assuming that
> the download via scp would have worked - but in fact it did
> nothing at all. However the sftp command worked fine - except
> having the same latency issues.
> 
> I then further discovered that actually while doing the "tahoe
> put", which saturates my hole upload capacity of about
> 800-900kbit/s, introduced quite a lot of latency; generally on
> average 1s, highly variable. So it looked very similar to the
> bufferbloat phaenomena [1].
> 
> To outrule, that the problem is the interaction of "tahoe get" and
> "tahoe put", I've switched to creating the traffic via netcat6
> instead of "tahoe get" (nc6 192.168.145.3 12345 < /dev/urandom &).
> And I still had the same aweful 20s and higher latency issues
> for tahoe-lafs itself.
> 
> I've then added some tc/qdisc stuff via wondershaper on my Linux
> router, limting the upload rate to 800kbit/s, which helped
> partially: The big upload of the VM on a dedicated PC running
> tahoe did not have an impact on the latency of other machines
> in my network anymore (e.g. the internet latency was normal
> again for my laptop). However the latency within the tinc VPN
> (that's where tahoe is currently running in; actually it was
> running within the tinc VPN layer within the BATMAN mesh layer,
> but I removed that mesh layer for testing for now) was still
> at the highly variable 1s latency.
> 
> To further verify that the tahoe-lafs issue and network latency
> issue was somewhere on my local network, I set up wondershaper to
> shape to an upload rate of 400kbit/s (and verified via vnstat,
> that really just about 400kbit/s get out of my internet ppp
> connection). And yes, the 1s latency was still present within the
> VM and the tahoe-lafs latency was still 20s+. So I think the issue
> should be somewhere at my place, at home, as I'm not saturating
> my ISP's routers or anything at or to the storage node on the other
> side anymore.
> 
> I've first tried also using wondershaper on the VPN interface to
> always limit the upload rate to something slightly lower than the
> one set on the router's ppp interface to move any bufferering
> issues as close to the sender as possible. But I had to discover,
> that for one thing wondershaper does not work for IPv6, seemingly
> due to a lack of the Linux kernel's tc dealing with anything other
> than IPv4 [2] (I was using netcat6 via IPv6 in the beginning). And
> for the final use-case the BATMAN packets are neither IPv6 nor IPv4.
> And for another thing a fixed upload capacity via wondershaper
> would probably not work when I'm simultaneously uploading
> something from my laptop, with for instance 200kbit/s.
> 
> So I ended up playing with the txqueuelen on the VPN interface
> tap0. Which of course is not really ideal either... For 1280 Byte
> traffic, a txqueuelen of 2 works great (it limits the throughput
> to about ~800kbit/s), but for 700 Byte packets (as they are coming
> from the mesh network layer due to internal fragmentation) that only
> saturates the link to about 200kbit/s, for 700 Byte packets a
> txqueuelen of 4 would be ideal. Anyway, for the tests I was now
> further going for a txqueuelen of 2 and will later for the real
> setup go for 4 until I find a better solution.
> 
> 
> So great, within the VPN I can now successfully upload with
> netcat6 at 800-900kbit/s and an ICMP ping over the VPN has a nice
> latency of only 70ms. Time to get back to tahoe-lafs:
> ---
> /usr/bin/time -f "%e" sh -c "for i in \`seq 1 10\`; do ~/allmydata-tahoe-1.8.2/bin/tahoe get root:music/8bitpeoples/8BP102.gif /tmp/8BP102.gif 2> /dev/null; done"
> -> 10.39!
> ---
> Cool, that finally works nicelly and has an acceptable delay of 1s
> per transfer on average. (see [3] -> 03:21:29 - 03:21:38 for details)
> 
> So just trying to verify that this also solved the issue for the
> parallel 'tahoe put' and 'tahoe get', I'm stopping netcat6 and
> start the 'tahoe put' of a 1GB file. I'm again waiting until the
> uploading starts and check that the ICMP ping is fine.
> Then starting the 'tahoe get' again:
> ---
> /usr/bin/time -f "%e" sh -c "for i in \`seq 1 10\`; do ~/allmydata-tahoe-1.8.2/bin/tahoe get root:music/8bitpeoples/8BP102.gif /tmp/8BP102.gif 2> /dev/null; done"
> -> 220.42!?
> ---
> 
> Pfeh, and now it seems to have something to do with 'tahoe put'
> and 'tahoe get' again, athough I had thought that I had outruled
> it in the beginning... the issue is there again, 22s average
> (see [4] -> 03:34:03 - 03:37:25 for details).
> 
> Finally, two more tests with a txqueuelen 500 on tap0 instead
> of 2, just to show the impact of that again:
> 
> With 'tahoe put':
> ---
> /usr/bin/time -f "%e" sh -c "for i in \`seq 1 10\`; do ~/allmydata-tahoe-1.8.2/bin/tahoe get root:music/8bitpeoples/8BP102.gif /tmp/8BP102.gif 2> /dev/null; done"
> -> 178.23
> ---
> (not quite sure why it is lower now, maybe some variance, or maybe
> the txqueuelen was so low that with multiple streams it had a bad
> impact on the performance again, 18s average;
> see [5] -> 03:50:57 - 03:53:38 for details)
> 
> With netcat6:
> ---
> /usr/bin/time -f "%e" sh -c "for i in \`seq 1 10\`; do ~/allmydata-tahoe-1.8.2/bin/tahoe get root:music/8bitpeoples/8BP102.gif /tmp/8BP102.gif 2> /dev/null; done"
> -> 51.28!?
> ---
> 5s average, 'tahoe put' definitely seems to have something to do
> with the issue, as with using netcat6 it's not as bad as with
> 'tahoe put' again.
> (see [6] -> 03:59:34 - 04:00:19 for details)
> 
> 
> Can anyone make any sense from these test results?
> 
> Cheers, Linus
> 
> 
> PS: Just for the record, I'm using tinc with the IFFOneQueue
> option.
> 
> [1]:
> http://gettys.wordpress.com/2010/12/03/introducing-the-criminal-mastermind-bufferbloat/
> http://www.bufferbloat.net/
> [2]:
> http://lartc.org/howto/lartc.adv-filter.ipv6.html
> 
> [3]: http://x-realis.dyndns.org/tahoe/logs/round1
> [4]: http://x-realis.dyndns.org/tahoe/logs/round2
> [5]: http://x-realis.dyndns.org/tahoe/logs/round3
> [6]: http://x-realis.dyndns.org/tahoe/logs/round4
> _______________________________________________
> tahoe-dev mailing list
> tahoe-dev at tahoe-lafs.org
> http://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev
>