Opened at 2010-01-30T01:43:33Z
Last modified at 2012-03-22T21:00:34Z
#932 assigned enhancement
benchmark Tahoe-LAFS compared to nosql dbs
Reported by: | zooko | Owned by: | bibilthaysose |
---|---|---|---|
Priority: | major | Milestone: | undecided |
Component: | dev-infrastructure | Version: | 1.5.0 |
Keywords: | scalability performance large | Cc: | zooko |
Launchpad Bug: |
Description
I'm curious how Tahoe-LAFS performs compared to nosql databases on the nosqlish loads that those users care about. Aaron Cordova did some benchmarks of Tahoe-LAFS vs. HDFS as the storage backend for Hadoop and reported in his HadoopWorld presentation that they performed about the same for the map-reduce computation (which is a read-intensive workload): http://www.slideshare.net/cloudera/hw09-map-reduce-over-tahoe-a-least-authority-encrypted-distributed-filesystem
Recently a scientist from Yahoo posted about his benchmarks of various nosql systems:
He says that his benchmarking code will be open-sourced soon pending approval from Yahoo's legal department. Maybe we could contribute patches that make Tahoe-LAFS one of the systems that his benchmark system can measure.
N.B. not to get anyone's hopes up, I would expect Tahoe-LAFS to perform very badly on those workloads! They typically want to assign values to user-specified keys, which we don't have a native implementation of and which we would have to simulate somehow, such as by letting the user-chosen keys be the childnames in a mutable directory. So I would expect Tahoe-LAFS to be pretty much off the charts for bad performance on those workloads. But, I might be pleasantly surprised. And also: "What gets measured gets improved!" :-)
Change History (11)
comment:1 Changed at 2010-10-22T23:13:54Z by zooko
comment:2 Changed at 2011-08-30T17:54:01Z by bibilthaysose
- Owner changed from somebody to bibilthaysose
- Status changed from new to assigned
I'm going to attempt this benchmarking against mongo.
comment:3 Changed at 2011-09-02T00:26:55Z by bibilthaysose
YCSB Interface layer skeleton @ https://github.com/grubino/Tahoe-YCSB--Interface-Layer/blob/master/TahoeLAFSClient.java
ping me if you want to help out, and i'll give out push privileges.
comment:4 Changed at 2011-09-14T15:56:14Z by bibilthaysose
reorganized and updated Tahoe java driver:
currently blocked on figuring out why the InputStream? returned by HttpResponse?.getEntity().getContent() is empty. The request seems to be processed correctly, but there's no content which can't be correct. Probably something I'm doing wrong with the Apache HTTP interface. I'll ask around.
comment:5 Changed at 2011-09-14T17:01:58Z by zooko
What does Apache have to do with it? Isn't the HTTP server the Tahoe-LAFS gateway?
comment:6 Changed at 2011-10-26T14:16:45Z by bibilthaysose
Hi zooko, org.apache.http.[...] is the client-side web interface that I'm using. If you followed the link that I provided, you should have seen some 'import org.apache.[...]' statements in the top of the source files. That's what I was referring to. It turns out that in the Java community, the apache http classes are preferred to the native Java ones. Go figure! Anywho, I believe I've ironed out most of the problems I was having there. I'm currently talking to one of the maintainers of the MongoDB YCSB layer to find out how to get this merged into the YCSB repo, or at least reviewed by someone who knows Java and YCSB. That reminds me: _PLEASE_REVIEW_THIS_CODE_ (when you get a chance):
https://github.com/grubino/Tahoe-YCSB--Interface-Layer
I'm sure that I've run afoul of Java best practices and general development best practices, and I invite anyone reading this to pleez point out my mistakes to me. I've looked over the code and have found a few things that I want to fix, but I'm sure I'm missing some stuff. Also, and not least of all, having reviewers makes me feel loved.
comment:7 Changed at 2011-10-26T14:23:13Z by bibilthaysose
I forgot to mention that I have been able to run some of the workloads (most notably workloada), and the performance for write operations is many orders of magnitude worse for Tahoe LAFS than for MongoDB. Mongo writes about 11,000 entries/sec (on my thinkpad T50) and my Tahoe LAFS test grid (1:1:1) writes about 0.5 (that's one entry every two seconds) or so. I'm not sure if that number would go up or down if I increased N/H/K. I'll post the real numbers when I have them handy, but it hasn't been a priority because there are other workloads that don't seem to be running properly. I want to make sure that the code is relatively bug-free before I actually post the numbers.
comment:8 Changed at 2011-10-26T14:27:50Z by zooko
Very cool! Real numbers! I look forward to having the time to investigate this. :-)
comment:9 Changed at 2011-10-28T22:31:12Z by bibilthaysose
Need a public place to put TahoeLAFSConnection.jar.
Currently, I just have the source directly in the YCSB tree (err my branch of it):
https://github.com/grubino/YCSB/tree/master/db/tahoe/src/org/lafs
But this isn't really appropriate since the TahoeLAFSConnection class is not really part of YCSB, and I don't think this is going to pass muster with the YCSB maintainers. So once I jar this up, I'll need to put it somewhere that I can link from in the Tahoe YCSB client docs. Preferably somewhere on tahoe-lafs.org. Also, someone from the project may want to review the code at some point and make sure I didn't do anything too horrendous. It might actually be appropriate to put the source for this in the darcs repo too at some point. That would have the nice side-effect of increasing the likelihood that someone from the project would look at it.
comment:10 Changed at 2012-03-22T21:00:14Z by zooko
Let's create a project below https://github.com/tahoe-lafs for this.
comment:11 Changed at 2012-03-22T21:00:34Z by zooko
- Cc zooko added
That benchmark that Brian Frank Cooper said would be open sourced has subsequently been open sourced:
http://github.com/brianfrankcooper/YCSB/wiki