Version 12 (modified by kevan, at 2010-03-28T19:21:59Z) (diff) |
---|
Here are notes that should be added to wikiGSoCIdeas in a format emulating this GSoC page from NetBSD. Leslie Hawthorn wrote: "Currently there's only a laundry list of suggested ideas but there is not any specificity on those ideas of how students could get them done, areas for them to get started, etc. Each suggestion needs to be categorized by difficulty, there needs to be pointers to where in the code base or documentation people can look for a better idea of how to proceed, etc." Later she said the wiki:GSoCIdeas page was a good improvement.
(See also last year's page: Ideas For Google Summer of Code of 2009.)
Deep Security Issues
Want to implement strong security features which advance the state of the art? It isn't easy! To tackle these you'll need to think carefully and to integrate security and usability, which are two halves of the same coin. But you'll have excellent mentors and the support of a wide community of interested security hackers.
- Fix Same-Origin-Policy design issue. Web content from different authors can interact in unintended ways in the victim's browser, such as JavaScript peeking at other frames or referrer headers. Before this project is undertaken, the problem description and proposed solutions need careful design review and consideration! The solutions should be considered prototypes and should be backwards compatible with the Tahoe network. Main ticket: #615 (Can JavaScript loaded from Tahoe access all your content which is loaded from Tahoe?) Tickets labelled 'capleak'
- Domain Mangling approaches:
- HTTP proxy approach
- Special scheme handling in browser add-ons
- Caja approach: Require all Javascript to pass the Caja verifier in the Tahoe-LAFS web frontend, then create an interface to the tahoe webapi which matches the intended capability semantics.
- Domain Mangling approaches:
- Tahoe-LAFS Cryptography:
- Help us author a paper proving the security of the crypto that will be used to implement new shorter caps (such as the Elk Point protocol or the "Semi-Private Key" construction from http://allmydata.org/~zooko/lafs.pdf ). Tickets labelled 'newcaps'
Server Selection
Which servers are connected to your client, and which of them have which shares of your files?
- Use Zeroconf or similar so nodes can find each other on a local network to enable quick local share migration.
- Deal with unreliable nodes and connections in general, getting away from allmydata.com's assumption that the grid is a big collection of reliable machines in a colo under a single administrative jurisdiction. Tickets labelled 'availability'
- Abstract out the server selection part of Tahoe-LAFS so that the projects in this category of "grid membership and server selection" can be mostly independent of the rest of Tahoe-LAFS. See also this note about standardization of LAFS.
- Write a GUI to visualize and manipulate the set of servers connected and the set holding shares of files.
Networking Improvements
- Dealing with NAT, ideally making it as easy to ignore as possible (taking advantage of upnp-igd and Zeroconf NAT-PMP). Tickets labelled 'firewall'
- 'tahoe sync'. Like dropbox (http://www.getdropbox.com/), the user could have a daemon which keeps the grid in sync with the local filesystem (maybe using inotify for uploads).
- Optimize upload/download transfer speed. Tickets labelled 'performance'
- Implement storage server protocol over HTTP. #510
Free The Windows Client
- Make the Windows client use only free open-source software. (Implementing WebDAV as described earlier is an alternative that would achieve a similar effect.)
Connecting Tahoe-LAFS To Other Things
- Filesystem access:
- improve the FUSE frontend (source code). Tickets labelled 'fuse'
- integrate Tahoe-LAFS with the GVFS Gnome virtual filesystem
- Explore running a Tahoe-LAFS grid over Tor or I2P to provide anonymity to servers and/or clients.
- Rescue the neglected C client library libtahoeclient_webapi.
Medium sized Distributed Mutable Files (MDMF)
Mutable files in Tahoe-LAFS have some significant limitations and performance issues, as discussed in docs/performance.txt. Users who aren't aware of these limitations are surprised when they find out that mutable files can't scale to large sizes without using unacceptable levels of memory, and that reading one byte of the file costs as much as reading the entire file.
A fix for this issue would essentially be fixing #393. That is,
- Developing mutable files that are segmented on upload, as with immutable files. Part of this would involve making sure that the way we currently ensure the integrity of the parts of mutable files stored on servers is adequate for your new design, and altering it if it isn't.
- Implementing efficient reading and writing of arbitrary spans of those mutable files.
This would make Tahoe-LAFS less surprising to users, and allow mutable files to be used in more ways than they currently are.
To learn more about this issue, you should first read docs/performance.txt, so you're familiar with the performance problems with mutable files as currently implemented. You should also look at the file encoding specification, to understand how immutable files are segmented (since you'll be doing something similar with this project). The mutable file specification may be informative as well. The mutable file upload and download code is in mutable, and, for comparison, the immutable file upload and download code is in immutable.