[tahoe-dev] Revised GSoC Proposal: Upload Strategy of Happiness

Mark Berger mjberger at stanford.edu
Mon Apr 29 07:45:14 UTC 2013


I've revised my GSoC proposal to address the project of implementing the
"upload strategy of happiness" after discussions from last week's dev chat.
Once again, if you see something wrong in my proposal, have any questions,
or have any suggestions, please let me know. All feedback is very much
appreciated.

Thanks!
Mark Berger


Organization: Tahoe-LAFS
=============

Student Info:
=============

Mark J. Berger

Time Zone: Pacific

Time Zone during GSoC: Eastern

IRC Handle: Mark_B at irc.freenode.net

Github: markberger

Email: mjberger [at] stanford.edu


University Info:
================

University: Stanford University

Major: Computer Science

Current Year: Freshman

Expected Graduation: June 2016

Degree: BS


About Me:
=========

I'm a freshman at Stanford University studying computer science. Right now
I am finishing up my core requirements and will be pursuing the artificial
intelligence track or the systems track within the major. My interests lie
in machine learning, large distributed systems, and web applications.

I began programming during an internship at Four Directions Productions in
2011, where I learned how to use Python in conjunction with Maya. The
majority of my college coursework has been in C or C++ on linux with a
little Java. This has made me familiar with tools such as GCC, GDB and
Valgrind.

While I have never contributed to an open source project before, I am
making an effort to learn about Tahoe-LAFS and become familiar with its
code base and community. Using a virtual machine, I've successfully
installed Tahoe on an Ubuntu server and connected to the Public Test Grid.
I've also subscribed to the mailing list, connected to the IRC channel, and
successfully pulled the code off of Github. While I know my lack of
experience in open source is a short coming, I am completely dedicated to
using GSoC's Community Bonding Period to overcome any obstacles before the
official coding period begins.



Project Title: Upload Strategy of Happiness
===========================


Abstract:
=========

The "servers of happiness" algorithm has improved Tahoe's ability to
maximize redundancy by ensuring a given subset of all shares are placed on
distinct nodes. However, the share placement algorithm was not designed to
pass the servers of happiness test [1]. The current algorithm satisfies the
majority of cases, but it fails to satisfy multiple instances where
happiness can be achieved (see tickets #1124 and #1130). Furthermore, the
algorithm fails to take advantage of existing shares, replacing said shares
instead of renewing their respective leases. Implementing the upload
strategy of happiness detailed in Kevan Carstensen's master thesis would
address these issues, as well as ease the development of share rebalancing
and repair [2].


Deliverables:
=========

1. Static files are uploaded in accordance to the algorithm detailed in
Kevan's master thesis, utilizing bipartite graphs to determine a maximum
matching graph.

2. Various scripts which are used to test the new share placement algorithm
on a network of virtual machines or a more suitable test environment.

3. A script to test whether the new placement algorithm meets Brian's
performance desiderata (200 shares, 1000 servers, 1 second).

4. Change documentation to reflect the implementation of the new algorithm.



Time Line:
==========

Note: I would like to have a code review session with my mentor on a weekly
basis at minimum, especially at the beginning of the program. Those
sessions are left off the time line to avoid redundancy


May 27th - June 17th (Community Bonding):
---------------------------------------------------------------

- Remain available via IRC and email

- Closely follow the development email list

- Isolate and understand the classes which pertain to the current
implementations of the servers of happiness algorithm to determine which
parts can be reused.

- Gain a greater understanding of the algorithm detailed Kevan's thesis,
including the Edmonds-Karp algorithm used to find the maximum matching
graph.

- Discuss with my mentor(s) and the community the best way to go about
testing the new share placement algorithm.


Note: June 3rd through the 14th is my final exams period and I will be
packing so that I can go home to Upstate NY. Since I will be very busy
during this time, not all of the above may be accomplished in time to start
coding. My classes do not resume until September 23rd, so I can push my
time line back a week or two if need be.


Jun 17th - 28th
---------------------

- Implement a rough version of the upload strategy of happiness

- Upload strategy should be a separate class if possible in order to make
it easier to apply to mutable files


Jul 1st - 12th
------------------

- Revise upload strategy code

- Throughly document initial upload strategy code

- Begin work on test scripts used to test the new algorithm


Jul 15th - 19th
--------------------

- Clean up test scripts

- Throughly document test scripts

- Fix minor bugs


Jul 22nd - Aug 2
-----------------------

- Begin testing the placement algorithm using the test scripts

- Tackle bugs as they arise

- Discuss possible edge cases with Tahoe-LAFS community


Aug 5th - 16th
--------------------

- Change documentation to reflect new placement algorithm

- Create test cases for possible edge cases


Aug 19th - 30th
----------------------

- Address any test cases which arose from further testing

- Clean up documentation changes

- Continue testing to ensure the new algorithm can be merged into the next
major release


The weeks of September 1st and 8th are left blank for flexibility.



Possible projects if the above are accomplished ahead of schedule:

=================================================

 - Change mutable files to use the same upload algorithm

 - Detect if disk(s) on a server are in a near fail state. If the disk(s)
are close to failing, notify the administrator, and slowly begin
redistributing shares to the other storage nodes (tickets #481 and #864).

 - Let the user specify a maximum storage capacity for a given storage node
based on folder size instead of free space left on the machine.

 - Tahoe backend for Google Drive (ticket #1831).


Link to Patch/Code Sample: https://github.com/tahoe-lafs/tahoe-lafs/pull/41



[1] "
https://zooko.com/uri/URI%3ADIR2-RO%3Aoljrwy5i2t3dhcx5mzrksegehe%3Axtac4ubcnr5eqo6d7h4wyj5sm522olj4mthizz2i3lfw2b5nla6q/Latest/compsci/Carstensen-2011-Robust_Resource_Allocation_In_Distributed_Filesystem.pdf".
Pages 32-33.

[2] https://tahoe-lafs.org/pipermail/tahoe-dev/2013-April/008216.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20130429/46543b93/attachment.html>


More information about the tahoe-dev mailing list