wiki:GSoCIdeas2009

Version 1 (modified by zooko, at 2010-03-10T18:12:22Z) (diff)

historical document from GSoC 2009

Historical document -- from Google Summer of Code of 2009.

Google Summer of Code

UPDATE: The Python Software Foundation, a GSoC umbrella organization, will sponsor Tahoe!

Students: you don't have to use one of the following Ideas. You can come up with your own Ideas, either inspired by these or your own Blue Sky idea. The important things to remember are: 1. E-mail the Mentor team (listed at the bottom of this page) immediately saying that you are interested. 2. Submit an Application to Google by Friday at 19:00 UTC. That application doesn't have to be final and polished -- you will be able to update it after the deadline, if you get it in before the deadline.

See also the PSF Ideas page, which includes a subset of these ideas, and a lot of other Python-but-not-Tahoe ideas.

Please read this page on the PSF wiki about what will be expected of you.

Ideas

What could a smart student do in one summer, if they didn't need to worry about getting a summer job to pay the bills?

Server Selection

Which servers are connected to your client, and which of them have which shares of your files?

  • Dynamically migrate shares to maintain file health.
  • Use Zeroconf or similar so nodes can find each other on a local network to enable quick local share migration.
  • Deal with unreliable nodes and connections in general, getting away from allmydata's assumption that the grid is a big collection of reliable machines in a colo under a single administrative jurisdiction
  • Abstract out the server selection part of Tahoe so that the projects in this category of "grid membership and server selection" can be mostly independent of the rest of Tahoe. See also this note about standardization of LAFS.
  • Write a GUI to visualize and manipulate the set of servers connected and the set holding shares of files.

Networking Improvements

  • Dealing with NAT, ideally making it as easy to ignore as possible (taking advantage of upnp-igd and Zeroconf NAT-PMP).
  • 'tahoe sync'. The proposed #601 bidirectional sync option would be great for using tahoe as we would with dropbox (http://www.getdropbox.com/). Like the latter, the user could have a daemon which keeps things in sync in pollings within a one or two seconds schedule (maybe using inotify for uploads). In practical terms an user could have many machines pointing to the same tahoe:dir, each machine mapping this resource to a local directory, and all these machines could then have their local copies in sync, via tahoe:dir. I think this is good when someone has many machines and alternates use between them, like a notebook, a home desktop and an office desktop, for instance.
  • Optimize upload/download transfer speed.
  • Implement storage server protocol over HTTP. #510

Free The Windows Client

Deep Security Issues

Want to implement strong security features which advance the state of the art? It isn't easy! To tackle these you'll need to think carefully and to integrate security and usability, which are two halves of the same coin. But you'll have excellent mentors and the support of a wide community of interested security hackers.

  • Fix Same-Origin-Policy design issue. Web content from different authors can interact in unintended ways in the victims browser, such as Javascript iterating over open windows, or peeking at a referrer header. Before this project is undertaken, the problem description and proposed solutions need careful design review and consideration! The solutions should be considered prototypes and should be backwards compatible with the Tahoe network. tickets: #615 (Can JavaScript loaded from Tahoe access all your content which is loaded from Tahoe?)
    • Domain Mangling approaches:
      • HTTP proxy approach
      • Special scheme handling in browser add-ons
    • CAJA approach: Require all Javascript to pass the CAJA verifier in the Tahoe web frontend, then create an interface to the tahoe webapi which matches the intended capability semantics.
  • Tahoe Cryptography:

Building Things On Top Of Tahoe

  • an interactive tree browser web frontend in JavaScript (Nathan has written most of one -- what can it grow into?)
  • a blog-like web app (perhaps addressing tiddly wishlist items)
  • Extend and improve the tiddly_on_tahoe implementation.
  • Retarget the TiddlyWeb to use Tahoe as its backend storage.
  • Port another light-weight open source web app to Tahoe+javascript (calendar, photo album, Bespin).

Connecting Tahoe To Other Things

  • Help with the C client library libtahoeclient_webapi.
  • Explore running a Tahoe grid over Tor or I2P to provide anonymity to servers and/or clients.
  • Integrate Tahoe with the operating system kernel through FUSE. source code, mailing list thread, ticket: #36 (FUSE integration), #621 (Make automated fuse tests run against blackmatch.).
  • Integrate a distributed revision control tool such as darcs, git, bzr, mercurial or monotone with Tahoe so that there is a single distributed, secure revision control repository stored on a Tahoe grid. ticket #663

Mentors

Who is willing to spend about five hours a week (according to Google) helping a student figure out how to do it right?

Tasks Too Small To Be A Whole Project Unto Themselves

But they could perhaps be the starting point of a summer project -- i.e. get into the code by fixing this bug and then build a solid addition to this part of the system.

  • sshfs working properly in linux boxes. Yeah, my Fedora 9 isn't ok with trunk revision, it keeps showing me the same first level directories in any level :)
  • Shell friendly errors. When cli (the shell command tool) is failing, it would be good, for shell users, to have a nicer output in text format, not html/css. The latter could be kept for webgui errors only. ticket: #646 (CLI should report webapi errors better)