1 | .. -*- coding: utf-8-with-signature -*- |
---|
2 | |
---|
3 | ======================= |
---|
4 | The Tahoe Upload Helper |
---|
5 | ======================= |
---|
6 | |
---|
7 | 1. `Overview`_ |
---|
8 | 2. `Setting Up A Helper`_ |
---|
9 | 3. `Using a Helper`_ |
---|
10 | 4. `Other Helper Modes`_ |
---|
11 | |
---|
12 | Overview |
---|
13 | ======== |
---|
14 | |
---|
15 | As described in the "Swarming Download, Trickling Upload" section of |
---|
16 | :doc:`architecture`, Tahoe uploads require more bandwidth than downloads: you |
---|
17 | must push the redundant shares during upload, but you do not need to retrieve |
---|
18 | them during download. With the default 3-of-10 encoding parameters, this |
---|
19 | means that an upload will require about 3.3x the traffic as a download of the |
---|
20 | same file. |
---|
21 | |
---|
22 | Unfortunately, this "expansion penalty" occurs in the same upstream direction |
---|
23 | that most consumer DSL lines are slow anyways. Typical ADSL lines get 8 times |
---|
24 | as much download capacity as upload capacity. When the ADSL upstream penalty |
---|
25 | is combined with the expansion penalty, the result is uploads that can take |
---|
26 | up to 32 times longer than downloads. |
---|
27 | |
---|
28 | The "Helper" is a service that can mitigate the expansion penalty by |
---|
29 | arranging for the client node to send data to a central Helper node instead |
---|
30 | of sending it directly to the storage servers. It sends ciphertext to the |
---|
31 | Helper, so the security properties remain the same as with non-Helper |
---|
32 | uploads. The Helper is responsible for applying the erasure encoding |
---|
33 | algorithm and placing the resulting shares on the storage servers. |
---|
34 | |
---|
35 | Of course, the helper cannot mitigate the ADSL upstream penalty. |
---|
36 | |
---|
37 | The second benefit of using an upload helper is that clients who lose their |
---|
38 | network connections while uploading a file (because of a network flap, or |
---|
39 | because they shut down their laptop while an upload was in progress) can |
---|
40 | resume their upload rather than needing to start again from scratch. The |
---|
41 | helper holds the partially-uploaded ciphertext on disk, and when the client |
---|
42 | tries to upload the same file a second time, it discovers that the partial |
---|
43 | ciphertext is already present. The client then only needs to upload the |
---|
44 | remaining ciphertext. This reduces the "interrupted upload penalty" to a |
---|
45 | minimum. |
---|
46 | |
---|
47 | This also serves to reduce the number of active connections between the |
---|
48 | client and the outside world: most of their traffic flows over a single TCP |
---|
49 | connection to the helper. This can improve TCP fairness, and should allow |
---|
50 | other applications that are sharing the same uplink to compete more evenly |
---|
51 | for the limited bandwidth. |
---|
52 | |
---|
53 | Setting Up A Helper |
---|
54 | =================== |
---|
55 | |
---|
56 | Who should consider running a helper? |
---|
57 | |
---|
58 | * Benevolent entities which wish to provide better upload speed for clients |
---|
59 | that have slow uplinks |
---|
60 | * Folks which have machines with upload bandwidth to spare. |
---|
61 | * Server grid operators who want clients to connect to a small number of |
---|
62 | helpers rather than a large number of storage servers (a "multi-tier" |
---|
63 | architecture) |
---|
64 | |
---|
65 | What sorts of machines are good candidates for running a helper? |
---|
66 | |
---|
67 | * The Helper needs to have good bandwidth to the storage servers. In |
---|
68 | particular, it needs to have at least 3.3x better upload bandwidth than |
---|
69 | the client does, or the client might as well upload directly to the |
---|
70 | storage servers. In a commercial grid, the helper should be in the same |
---|
71 | colo (and preferably in the same rack) as the storage servers. |
---|
72 | * The Helper will take on most of the CPU load involved in uploading a file. |
---|
73 | So having a dedicated machine will give better results. |
---|
74 | * The Helper buffers ciphertext on disk, so the host will need at least as |
---|
75 | much free disk space as there will be simultaneous uploads. When an upload |
---|
76 | is interrupted, that space will be used for a longer period of time. |
---|
77 | |
---|
78 | To turn a Tahoe-LAFS node into a helper (i.e. to run a helper service in |
---|
79 | addition to whatever else that node is doing), edit the tahoe.cfg file in your |
---|
80 | node's base directory and set "enabled = true" in the section named |
---|
81 | "[helper]". |
---|
82 | |
---|
83 | Then restart the node. This will signal the node to create a Helper service |
---|
84 | and listen for incoming requests. Once the node has started, there will be a |
---|
85 | file named private/helper.furl which contains the contact information for the |
---|
86 | helper: you will need to give this FURL to any clients that wish to use your |
---|
87 | helper. |
---|
88 | |
---|
89 | :: |
---|
90 | |
---|
91 | cat $BASEDIR/private/helper.furl | mail -s "helper furl" friend@example.com |
---|
92 | |
---|
93 | You can tell if your node is running a helper by looking at its web status |
---|
94 | page. Assuming that you've set up the 'webport' to use port 3456, point your |
---|
95 | browser at ``http://localhost:3456/`` . The welcome page will say "Helper: 0 |
---|
96 | active uploads" or "Not running helper" as appropriate. The |
---|
97 | http://localhost:3456/helper_status page will also provide details on what |
---|
98 | the helper is currently doing. |
---|
99 | |
---|
100 | The helper will store the ciphertext that is is fetching from clients in |
---|
101 | $BASEDIR/helper/CHK_incoming/ . Once all the ciphertext has been fetched, it |
---|
102 | will be moved to $BASEDIR/helper/CHK_encoding/ and erasure-coding will |
---|
103 | commence. Once the file is fully encoded and the shares are pushed to the |
---|
104 | storage servers, the ciphertext file will be deleted. |
---|
105 | |
---|
106 | If a client disconnects while the ciphertext is being fetched, the partial |
---|
107 | ciphertext will remain in CHK_incoming/ until they reconnect and finish |
---|
108 | sending it. If a client disconnects while the ciphertext is being encoded, |
---|
109 | the data will remain in CHK_encoding/ until they reconnect and encoding is |
---|
110 | finished. For long-running and busy helpers, it may be a good idea to delete |
---|
111 | files in these directories that have not been modified for a week or two. |
---|
112 | Future versions of tahoe will try to self-manage these files a bit better. |
---|
113 | |
---|
114 | Using a Helper |
---|
115 | ============== |
---|
116 | |
---|
117 | Who should consider using a Helper? |
---|
118 | |
---|
119 | * clients with limited upstream bandwidth, such as a consumer ADSL line |
---|
120 | * clients who believe that the helper will give them faster uploads than |
---|
121 | they could achieve with a direct upload |
---|
122 | * clients who experience problems with TCP connection fairness: if other |
---|
123 | programs or machines in the same home are getting less than their fair |
---|
124 | share of upload bandwidth. If the connection is being shared fairly, then |
---|
125 | a Tahoe upload that is happening at the same time as a single SFTP upload |
---|
126 | should get half the bandwidth. |
---|
127 | * clients who have been given the helper.furl by someone who is running a |
---|
128 | Helper and is willing to let them use it |
---|
129 | |
---|
130 | To take advantage of somebody else's Helper, take the helper furl that they |
---|
131 | give you, and edit your tahoe.cfg file. Enter the helper's furl into the |
---|
132 | value of the key "helper.furl" in the "[client]" section of tahoe.cfg, as |
---|
133 | described in the "Client Configuration" section of :doc:`configuration`. |
---|
134 | |
---|
135 | Then restart the node. This will signal the client to try and connect to the |
---|
136 | helper. Subsequent uploads will use the helper rather than using direct |
---|
137 | connections to the storage server. |
---|
138 | |
---|
139 | If the node has been configured to use a helper, that node's HTTP welcome |
---|
140 | page (``http://localhost:3456/``) will say "Helper: $HELPERFURL" instead of |
---|
141 | "Helper: None". If the helper is actually running and reachable, the bullet |
---|
142 | to the left of "Helper" will be green. |
---|
143 | |
---|
144 | The helper is optional. If a helper is connected when an upload begins, the |
---|
145 | upload will use the helper. If there is no helper connection present when an |
---|
146 | upload begins, that upload will connect directly to the storage servers. The |
---|
147 | client will automatically attempt to reconnect to the helper if the |
---|
148 | connection is lost, using the same exponential-backoff algorithm as all other |
---|
149 | tahoe/foolscap connections. |
---|
150 | |
---|
151 | The upload/download status page (``http://localhost:3456/status``) will announce |
---|
152 | the using-helper-or-not state of each upload, in the "Helper?" column. |
---|
153 | |
---|
154 | Other Helper Modes |
---|
155 | ================== |
---|
156 | |
---|
157 | The Tahoe Helper only currently helps with one kind of operation: uploading |
---|
158 | immutable files. There are three other things it might be able to help with |
---|
159 | in the future: |
---|
160 | |
---|
161 | * downloading immutable files |
---|
162 | * uploading mutable files (such as directories) |
---|
163 | * downloading mutable files (like directories) |
---|
164 | |
---|
165 | Since mutable files are currently limited in size, the ADSL upstream penalty |
---|
166 | is not so severe for them. There is no ADSL penalty to downloads, but there |
---|
167 | may still be benefit to extending the helper interface to assist with them: |
---|
168 | fewer connections to the storage servers, and better TCP fairness. |
---|
169 | |
---|
170 | A future version of the Tahoe helper might provide assistance with these |
---|
171 | other modes. If it were to help with all four modes, then the clients would |
---|
172 | not need direct connections to the storage servers at all: clients would |
---|
173 | connect to helpers, and helpers would connect to servers. For a large grid |
---|
174 | with tens of thousands of clients, this might make the grid more scalable. |
---|