1 | #LyX 1.6.2 created this file. For more info see http://www.lyx.org/ |
---|
2 | \lyxformat 345 |
---|
3 | \begin_document |
---|
4 | \begin_header |
---|
5 | \textclass amsart |
---|
6 | \use_default_options true |
---|
7 | \begin_modules |
---|
8 | theorems-ams |
---|
9 | theorems-ams-extended |
---|
10 | \end_modules |
---|
11 | \language english |
---|
12 | \inputencoding auto |
---|
13 | \font_roman default |
---|
14 | \font_sans default |
---|
15 | \font_typewriter default |
---|
16 | \font_default_family default |
---|
17 | \font_sc false |
---|
18 | \font_osf false |
---|
19 | \font_sf_scale 100 |
---|
20 | \font_tt_scale 100 |
---|
21 | |
---|
22 | \graphics default |
---|
23 | \float_placement h |
---|
24 | \paperfontsize default |
---|
25 | \spacing single |
---|
26 | \use_hyperref false |
---|
27 | \papersize default |
---|
28 | \use_geometry false |
---|
29 | \use_amsmath 1 |
---|
30 | \use_esint 1 |
---|
31 | \cite_engine basic |
---|
32 | \use_bibtopic false |
---|
33 | \paperorientation portrait |
---|
34 | \secnumdepth 3 |
---|
35 | \tocdepth 3 |
---|
36 | \paragraph_separation indent |
---|
37 | \defskip medskip |
---|
38 | \quotes_language english |
---|
39 | \papercolumns 1 |
---|
40 | \papersides 1 |
---|
41 | \paperpagestyle default |
---|
42 | \tracking_changes false |
---|
43 | \output_changes false |
---|
44 | \author "" |
---|
45 | \author "" |
---|
46 | \end_header |
---|
47 | |
---|
48 | \begin_body |
---|
49 | |
---|
50 | \begin_layout Title |
---|
51 | Tahoe Distributed Filesharing System Loss Model |
---|
52 | \end_layout |
---|
53 | |
---|
54 | \begin_layout Author |
---|
55 | Shawn Willden |
---|
56 | \end_layout |
---|
57 | |
---|
58 | \begin_layout Date |
---|
59 | 07/22/2009 |
---|
60 | \end_layout |
---|
61 | |
---|
62 | \begin_layout Address |
---|
63 | South Weber, Utah |
---|
64 | \end_layout |
---|
65 | |
---|
66 | \begin_layout Email |
---|
67 | shawn@willden.org |
---|
68 | \end_layout |
---|
69 | |
---|
70 | \begin_layout Abstract |
---|
71 | The abstract goes here |
---|
72 | \end_layout |
---|
73 | |
---|
74 | \begin_layout Section |
---|
75 | Problem Statement |
---|
76 | \end_layout |
---|
77 | |
---|
78 | \begin_layout Standard |
---|
79 | The allmydata Tahoe distributed file system uses Reed-Solomon erasure coding |
---|
80 | to split files into |
---|
81 | \begin_inset Formula $N$ |
---|
82 | \end_inset |
---|
83 | |
---|
84 | shares which are delivered to randomly-selected peers in a distributed |
---|
85 | network. |
---|
86 | The file can later be reassembled from any |
---|
87 | \begin_inset Formula $k\leq N$ |
---|
88 | \end_inset |
---|
89 | |
---|
90 | of the shares, if they are available. |
---|
91 | \end_layout |
---|
92 | |
---|
93 | \begin_layout Standard |
---|
94 | Over time shares are lost for a variety of reasons. |
---|
95 | Storage servers may crash, be destroyed or simply be removed from the network. |
---|
96 | To mitigate such losses, Tahoe network clients employ a repair agent which |
---|
97 | scans the peers once per time period |
---|
98 | \begin_inset Formula $A$ |
---|
99 | \end_inset |
---|
100 | |
---|
101 | and determines how many of the shares remain. |
---|
102 | If less than |
---|
103 | \begin_inset Formula $L$ |
---|
104 | \end_inset |
---|
105 | |
---|
106 | ( |
---|
107 | \begin_inset Formula $k\leq L\leq N$ |
---|
108 | \end_inset |
---|
109 | |
---|
110 | ) shares remain, then the repairer reconstructs the file shares and redistribute |
---|
111 | s the missing ones, bringing the availability back up to full. |
---|
112 | \end_layout |
---|
113 | |
---|
114 | \begin_layout Standard |
---|
115 | The question we're trying to answer is "What is the probability that we'll |
---|
116 | be able to reassemble the file at some later time |
---|
117 | \begin_inset Formula $T$ |
---|
118 | \end_inset |
---|
119 | |
---|
120 | ?". |
---|
121 | We'd also like to be able to determine what values we should choose for |
---|
122 | |
---|
123 | \begin_inset Formula $k$ |
---|
124 | \end_inset |
---|
125 | |
---|
126 | , |
---|
127 | \begin_inset Formula $N$ |
---|
128 | \end_inset |
---|
129 | |
---|
130 | , |
---|
131 | \begin_inset Formula $A$ |
---|
132 | \end_inset |
---|
133 | |
---|
134 | , and |
---|
135 | \begin_inset Formula $L$ |
---|
136 | \end_inset |
---|
137 | |
---|
138 | in order to ensure |
---|
139 | \begin_inset Formula $Pr[loss]\leq r$ |
---|
140 | \end_inset |
---|
141 | |
---|
142 | for some threshold probability |
---|
143 | \begin_inset Formula $r$ |
---|
144 | \end_inset |
---|
145 | |
---|
146 | . |
---|
147 | This is an optimization problem because although we could obtain very low |
---|
148 | |
---|
149 | \begin_inset Formula $Pr[loss]$ |
---|
150 | \end_inset |
---|
151 | |
---|
152 | by selecting conservative parameters, these choices have costs. |
---|
153 | The peer storage and bandwidth consumed by the share distribution process |
---|
154 | are approximately |
---|
155 | \begin_inset Formula $\nicefrac{N}{k}$ |
---|
156 | \end_inset |
---|
157 | |
---|
158 | times the size of the original file, so we would like to minimize |
---|
159 | \begin_inset Formula $\nicefrac{N}{k}$ |
---|
160 | \end_inset |
---|
161 | |
---|
162 | , consistent with |
---|
163 | \begin_inset Formula $Pr[loss]\leq r$ |
---|
164 | \end_inset |
---|
165 | |
---|
166 | . |
---|
167 | Likewise, a frequent and aggressive repair process keeps the number of |
---|
168 | shares available close to |
---|
169 | \begin_inset Formula $N,$ |
---|
170 | \end_inset |
---|
171 | |
---|
172 | but at a cost in bandwidth and processing time as the repair agent downloads |
---|
173 | |
---|
174 | \begin_inset Formula $k$ |
---|
175 | \end_inset |
---|
176 | |
---|
177 | shares, reconstructs the file and uploads new shares to replace those that |
---|
178 | are lost. |
---|
179 | \end_layout |
---|
180 | |
---|
181 | \begin_layout Section |
---|
182 | Reliability |
---|
183 | \end_layout |
---|
184 | |
---|
185 | \begin_layout Standard |
---|
186 | The probability that the file becomes unrecoverable is dependent upon the |
---|
187 | probability that the peers to whom we send shares are able to return those |
---|
188 | copies on demand. |
---|
189 | Shares that are corrupted are detected and discarded, so there is no need |
---|
190 | to distinguish between corruption and loss. |
---|
191 | \end_layout |
---|
192 | |
---|
193 | \begin_layout Standard |
---|
194 | Many factors affect share availability. |
---|
195 | Availability can be temporarily interrupted by peer unavailability due |
---|
196 | to network outages, power failures or administrative shutdown, among other |
---|
197 | reasons. |
---|
198 | Availability can be permanently lost due to failure or corruption of storage |
---|
199 | media, catastrophic damage to the peer system, administrative error, withdrawal |
---|
200 | from the network, malicious corruption, etc. |
---|
201 | \end_layout |
---|
202 | |
---|
203 | \begin_layout Standard |
---|
204 | The existence of intermittent failure modes motivates the introduction of |
---|
205 | a distinction between |
---|
206 | \noun on |
---|
207 | availability |
---|
208 | \noun default |
---|
209 | and |
---|
210 | \noun on |
---|
211 | reliability |
---|
212 | \noun default |
---|
213 | . |
---|
214 | Reliability is the probability that a share is retrievable assuming intermitten |
---|
215 | t failures can be waited out, so reliability considers only permanent failures. |
---|
216 | Availability considers all failures, and is focused on the probability |
---|
217 | of retrieval within some defined time frame. |
---|
218 | \end_layout |
---|
219 | |
---|
220 | \begin_layout Standard |
---|
221 | Another consideration is that some failures affect multiple shares. |
---|
222 | If multiple shares of a file are stored on a single hard drive, for example, |
---|
223 | failure of that drive may lose them all. |
---|
224 | Catastrophic damage to a data center may destroy all shares on all peers |
---|
225 | in that data center. |
---|
226 | \end_layout |
---|
227 | |
---|
228 | \begin_layout Standard |
---|
229 | While the types of failures that may occur are quite consistent across peers, |
---|
230 | their probabilities differ dramatically. |
---|
231 | A professionally-administered server with redundant storage, power and |
---|
232 | Internet located in a carefully-monitored data center with automatic fire |
---|
233 | suppression systems is much less likely to become either temporarily or |
---|
234 | permanently unavailable than the typical virus and malware-ridden home |
---|
235 | computer on a single cable modem connection. |
---|
236 | A variety of situations in between exist as well, such as the case of the |
---|
237 | author's home file server, which is administered by an IT professional |
---|
238 | and uses RAID level 6 redundant storage, but runs on old, cobbled-together |
---|
239 | equipment, and has a consumer-grade Internet connection. |
---|
240 | \end_layout |
---|
241 | |
---|
242 | \begin_layout Standard |
---|
243 | To begin with, let's use a simple definition of reliability: |
---|
244 | \end_layout |
---|
245 | |
---|
246 | \begin_layout Definition |
---|
247 | |
---|
248 | \noun on |
---|
249 | Reliability |
---|
250 | \noun default |
---|
251 | is the probability |
---|
252 | \begin_inset Formula $p_{i}$ |
---|
253 | \end_inset |
---|
254 | |
---|
255 | that a share |
---|
256 | \begin_inset Formula $s_{i}$ |
---|
257 | \end_inset |
---|
258 | |
---|
259 | will survive to (be retrievable at) time |
---|
260 | \begin_inset Formula $T=A$ |
---|
261 | \end_inset |
---|
262 | |
---|
263 | , ignoring intermittent failures. |
---|
264 | That is, the probability that the share will be retrievable at the end |
---|
265 | of the current repair cycle, and therefore usable by the repairer to regenerate |
---|
266 | any lost shares. |
---|
267 | \end_layout |
---|
268 | |
---|
269 | \begin_layout Standard |
---|
270 | Reliability |
---|
271 | \begin_inset Formula $p_{i}$ |
---|
272 | \end_inset |
---|
273 | |
---|
274 | is clearly dependent on |
---|
275 | \begin_inset Formula $A$ |
---|
276 | \end_inset |
---|
277 | |
---|
278 | . |
---|
279 | Short repair cycles offer less time for shares to |
---|
280 | \begin_inset Quotes eld |
---|
281 | \end_inset |
---|
282 | |
---|
283 | decay |
---|
284 | \begin_inset Quotes erd |
---|
285 | \end_inset |
---|
286 | |
---|
287 | into unavailability. |
---|
288 | \end_layout |
---|
289 | |
---|
290 | \begin_layout Subsection |
---|
291 | Peer Reliability |
---|
292 | \end_layout |
---|
293 | |
---|
294 | \begin_layout Standard |
---|
295 | Since peer reliability is the basis for any computations we may do on share |
---|
296 | and file reliability, we must have a way to estimate it. |
---|
297 | Reliability modeling of hardware, software and human performance are each |
---|
298 | complex topics, the subject of much ongoing research. |
---|
299 | In particular, the reliability of one of the key components of any peer |
---|
300 | from our perspective -- the hard drive where file shares are stored -- |
---|
301 | is the subject of much current debate. |
---|
302 | \end_layout |
---|
303 | |
---|
304 | \begin_layout Standard |
---|
305 | A common assumption about hardware failure is that it follows the |
---|
306 | \begin_inset Quotes eld |
---|
307 | \end_inset |
---|
308 | |
---|
309 | bathtub curve |
---|
310 | \begin_inset Quotes erd |
---|
311 | \end_inset |
---|
312 | |
---|
313 | , with frequent failures during the first few months, a constant failure |
---|
314 | rate for a few years and then a rising failure rate as the hardware wears |
---|
315 | out. |
---|
316 | This curve is often flattened by burn-in stress testing, and by periodic |
---|
317 | replacement that assures that in-service components never reach |
---|
318 | \begin_inset Quotes eld |
---|
319 | \end_inset |
---|
320 | |
---|
321 | old age |
---|
322 | \begin_inset Quotes erd |
---|
323 | \end_inset |
---|
324 | |
---|
325 | . |
---|
326 | \end_layout |
---|
327 | |
---|
328 | \begin_layout Standard |
---|
329 | In any case, we're generally going to ignore all of that complexity and |
---|
330 | focus on the bottom of the bathtub, assuming constant failure rates. |
---|
331 | This is a particularly reasonable assumption as long as we're focused on |
---|
332 | failures during a particular, relatively short interval |
---|
333 | \begin_inset Formula $A$ |
---|
334 | \end_inset |
---|
335 | |
---|
336 | . |
---|
337 | Towards the end of this paper, as we examine failures over many repair |
---|
338 | intervals, the assumption becomes more tenuous, and we note some of the |
---|
339 | issues. |
---|
340 | \end_layout |
---|
341 | |
---|
342 | \begin_layout Subsubsection |
---|
343 | Estimate Adaptation |
---|
344 | \end_layout |
---|
345 | |
---|
346 | \begin_layout Standard |
---|
347 | Even assuming constant failure rates, however, it will be rare that the |
---|
348 | duration of |
---|
349 | \begin_inset Formula $A$ |
---|
350 | \end_inset |
---|
351 | |
---|
352 | coincides with the available failure rate data, particularly since we want |
---|
353 | to view |
---|
354 | \begin_inset Formula $A$ |
---|
355 | \end_inset |
---|
356 | |
---|
357 | as a tunable parameter. |
---|
358 | It's necessary to be able adapt failure rates baselined against any given |
---|
359 | duration to the selected value of |
---|
360 | \begin_inset Formula $A$ |
---|
361 | \end_inset |
---|
362 | |
---|
363 | . |
---|
364 | \end_layout |
---|
365 | |
---|
366 | \begin_layout Standard |
---|
367 | Another issue is that failure rates of hardware, etc., are necessarily continuous |
---|
368 | in nature, while the per-interval failure/survival rates that are of interest |
---|
369 | for file reliability calculations are discrete -- a peer either survives |
---|
370 | or fails during the interval. |
---|
371 | The continuous nature of failure rates means that the common and obvious |
---|
372 | methods for estimating failure rates result in values that follow continuous, |
---|
373 | not discrete distributions. |
---|
374 | The difference is minor for small failure probabilities, and converges |
---|
375 | to zero as the number of intervals goes to infinity, but is important enough |
---|
376 | in some cases to be worth correcting for. |
---|
377 | \end_layout |
---|
378 | |
---|
379 | \begin_layout Standard |
---|
380 | Continuous failure rates are described in terms of mean time to failure, |
---|
381 | and under the assumption that failure rates are constant, are exponentially |
---|
382 | distributed. |
---|
383 | Under these assumptions, the probability that a machine fails at time |
---|
384 | \begin_inset Formula $t$ |
---|
385 | \end_inset |
---|
386 | |
---|
387 | , is |
---|
388 | \begin_inset Formula \[ |
---|
389 | f\left(t\right)=\lambda e^{-\lambda t}\] |
---|
390 | |
---|
391 | \end_inset |
---|
392 | |
---|
393 | where |
---|
394 | \begin_inset Formula $\lambda$ |
---|
395 | \end_inset |
---|
396 | |
---|
397 | represents the per unit-time failure rate. |
---|
398 | The probability that a machine fails at or before time |
---|
399 | \begin_inset Formula $A$ |
---|
400 | \end_inset |
---|
401 | |
---|
402 | is therefore |
---|
403 | \begin_inset Formula \begin{align} |
---|
404 | F\left(t\right) & =\int_{0}^{A}f\left(x\right)dx\nonumber \\ |
---|
405 | & =\int_{0}^{A}\lambda e^{-\lambda x}dx\nonumber \\ |
---|
406 | & =1-e^{-\lambda A}\label{eq:failure-time}\end{align} |
---|
407 | |
---|
408 | \end_inset |
---|
409 | |
---|
410 | |
---|
411 | \end_layout |
---|
412 | |
---|
413 | \begin_layout Standard |
---|
414 | Note that |
---|
415 | \begin_inset Formula $A$ |
---|
416 | \end_inset |
---|
417 | |
---|
418 | and |
---|
419 | \begin_inset Formula $\lambda$ |
---|
420 | \end_inset |
---|
421 | |
---|
422 | in |
---|
423 | \begin_inset CommandInset ref |
---|
424 | LatexCommand ref |
---|
425 | reference "eq:failure-time" |
---|
426 | |
---|
427 | \end_inset |
---|
428 | |
---|
429 | must be expressed in consistent time units. |
---|
430 | If they're different, unit conversions should be applied in the normal |
---|
431 | way. |
---|
432 | For example, if the estimate for |
---|
433 | \begin_inset Formula $\lambda$ |
---|
434 | \end_inset |
---|
435 | |
---|
436 | is 750 failures per million hours, and |
---|
437 | \begin_inset Formula $A$ |
---|
438 | \end_inset |
---|
439 | |
---|
440 | is one month, then either |
---|
441 | \begin_inset Formula $A$ |
---|
442 | \end_inset |
---|
443 | |
---|
444 | should be represented as |
---|
445 | \begin_inset Formula $30\cdot24/1000000=.00072$ |
---|
446 | \end_inset |
---|
447 | |
---|
448 | , or |
---|
449 | \begin_inset Formula $\lambda$ |
---|
450 | \end_inset |
---|
451 | |
---|
452 | should be converted to failures per month. |
---|
453 | Or both may be converted to hours. |
---|
454 | \end_layout |
---|
455 | |
---|
456 | \begin_layout Subsubsection |
---|
457 | Acquiring Peer Reliability Estimates |
---|
458 | \end_layout |
---|
459 | |
---|
460 | \begin_layout Standard |
---|
461 | Need to write this. |
---|
462 | \end_layout |
---|
463 | |
---|
464 | \begin_layout Subsection |
---|
465 | Uniform Reliability |
---|
466 | \begin_inset CommandInset label |
---|
467 | LatexCommand label |
---|
468 | name "sub:Fixed-Reliability" |
---|
469 | |
---|
470 | \end_inset |
---|
471 | |
---|
472 | |
---|
473 | \end_layout |
---|
474 | |
---|
475 | \begin_layout Standard |
---|
476 | In the simplest case, the peers holding the file shares all have the same |
---|
477 | reliability |
---|
478 | \begin_inset Formula $p$ |
---|
479 | \end_inset |
---|
480 | |
---|
481 | , and are all independent from one another. |
---|
482 | Let |
---|
483 | \begin_inset Formula $K$ |
---|
484 | \end_inset |
---|
485 | |
---|
486 | be a random variable that represents the number of shares that survive |
---|
487 | |
---|
488 | \begin_inset Formula $A$ |
---|
489 | \end_inset |
---|
490 | |
---|
491 | . |
---|
492 | Each share's survival can be viewed as an independent Bernoulli trial with |
---|
493 | a success probability of |
---|
494 | \begin_inset Formula $p$ |
---|
495 | \end_inset |
---|
496 | |
---|
497 | , which means that |
---|
498 | \begin_inset Formula $K$ |
---|
499 | \end_inset |
---|
500 | |
---|
501 | follows the binomial distribution with parameters |
---|
502 | \begin_inset Formula $N$ |
---|
503 | \end_inset |
---|
504 | |
---|
505 | and |
---|
506 | \begin_inset Formula $p$ |
---|
507 | \end_inset |
---|
508 | |
---|
509 | . |
---|
510 | That is, |
---|
511 | \begin_inset Formula $K\sim B(N,p)$ |
---|
512 | \end_inset |
---|
513 | |
---|
514 | . |
---|
515 | \end_layout |
---|
516 | |
---|
517 | \begin_layout Theorem |
---|
518 | Binomial Distribution Theorem |
---|
519 | \end_layout |
---|
520 | |
---|
521 | \begin_layout Theorem |
---|
522 | Consider |
---|
523 | \begin_inset Formula $n$ |
---|
524 | \end_inset |
---|
525 | |
---|
526 | independent Bernoulli trials |
---|
527 | \begin_inset Foot |
---|
528 | status collapsed |
---|
529 | |
---|
530 | \begin_layout Plain Layout |
---|
531 | A Bernoulli trial is simply a test of some sort that results in one of two |
---|
532 | outcomes, one of which is designated success and the other failure. |
---|
533 | The classic example of a Bernoulli trial is a coin toss. |
---|
534 | \end_layout |
---|
535 | |
---|
536 | \end_inset |
---|
537 | |
---|
538 | that succeed with probability |
---|
539 | \begin_inset Formula $p$ |
---|
540 | \end_inset |
---|
541 | |
---|
542 | , and let |
---|
543 | \begin_inset Formula $K$ |
---|
544 | \end_inset |
---|
545 | |
---|
546 | be a random variable that represents the number, |
---|
547 | \begin_inset Formula $m$ |
---|
548 | \end_inset |
---|
549 | |
---|
550 | , of successes, |
---|
551 | \begin_inset Formula $0\le m\le n$ |
---|
552 | \end_inset |
---|
553 | |
---|
554 | . |
---|
555 | We say that |
---|
556 | \begin_inset Formula $K$ |
---|
557 | \end_inset |
---|
558 | |
---|
559 | follows the Binomial Distribution with parameters n and p, denoted |
---|
560 | \begin_inset Formula $K\sim B(n,p)$ |
---|
561 | \end_inset |
---|
562 | |
---|
563 | . |
---|
564 | The probability mass function (PMF) of K is a function that gives the probabili |
---|
565 | ty that |
---|
566 | \begin_inset Formula $K$ |
---|
567 | \end_inset |
---|
568 | |
---|
569 | takes a particular value |
---|
570 | \begin_inset Formula $m$ |
---|
571 | \end_inset |
---|
572 | |
---|
573 | (the probability that there are exactly |
---|
574 | \begin_inset Formula $m$ |
---|
575 | \end_inset |
---|
576 | |
---|
577 | successful trials, and therefore |
---|
578 | \begin_inset Formula $n-m$ |
---|
579 | \end_inset |
---|
580 | |
---|
581 | failures). |
---|
582 | The PMF of K is |
---|
583 | \begin_inset Formula \begin{equation} |
---|
584 | Pr[K=m]=f(m;n,p)=\binom{n}{m}p^{m}(1-p)^{n-m}\label{eq:binomial-pmf}\end{equation} |
---|
585 | |
---|
586 | \end_inset |
---|
587 | |
---|
588 | |
---|
589 | \end_layout |
---|
590 | |
---|
591 | \begin_layout Proof |
---|
592 | Consider the specific case of exactly |
---|
593 | \begin_inset Formula $m$ |
---|
594 | \end_inset |
---|
595 | |
---|
596 | successes followed by |
---|
597 | \begin_inset Formula $n-m$ |
---|
598 | \end_inset |
---|
599 | |
---|
600 | failures, because each success has probability |
---|
601 | \begin_inset Formula $p$ |
---|
602 | \end_inset |
---|
603 | |
---|
604 | , each failure has probability |
---|
605 | \begin_inset Formula $1-p$ |
---|
606 | \end_inset |
---|
607 | |
---|
608 | , and the trials are independent, the probability of this exact case occurring |
---|
609 | is |
---|
610 | \begin_inset Formula $p^{m}\left(1-p\right)^{\left(n-m\right)}$ |
---|
611 | \end_inset |
---|
612 | |
---|
613 | , the product of the probabilities of the outcome of each trial. |
---|
614 | \end_layout |
---|
615 | |
---|
616 | \begin_layout Proof |
---|
617 | Now consider any reordering of these |
---|
618 | \begin_inset Formula $m$ |
---|
619 | \end_inset |
---|
620 | |
---|
621 | successes and |
---|
622 | \begin_inset Formula $n$ |
---|
623 | \end_inset |
---|
624 | |
---|
625 | failures. |
---|
626 | Any such reordering occurs with the same probability |
---|
627 | \begin_inset Formula $p^{m}\left(1-p\right)^{\left(n-m\right)}$ |
---|
628 | \end_inset |
---|
629 | |
---|
630 | , but with the terms of the product reordered. |
---|
631 | Since multiplication is commutative, each such reordering has the same |
---|
632 | probability. |
---|
633 | There are n-choose-m such orderings, and each ordering is an independent |
---|
634 | event, meaning we can sum the probabilities of the individual orderings, |
---|
635 | so the probability that any ordering of |
---|
636 | \begin_inset Formula $m$ |
---|
637 | \end_inset |
---|
638 | |
---|
639 | successes and |
---|
640 | \begin_inset Formula $n-m$ |
---|
641 | \end_inset |
---|
642 | |
---|
643 | failures occurs is given by |
---|
644 | \begin_inset Formula \[ |
---|
645 | \binom{n}{m}p^{m}\left(1-p\right)^{\left(n-m\right)}\] |
---|
646 | |
---|
647 | \end_inset |
---|
648 | |
---|
649 | which is the right-hand-side of equation |
---|
650 | \begin_inset CommandInset ref |
---|
651 | LatexCommand ref |
---|
652 | reference "eq:binomial-pmf" |
---|
653 | |
---|
654 | \end_inset |
---|
655 | |
---|
656 | . |
---|
657 | \end_layout |
---|
658 | |
---|
659 | \begin_layout Standard |
---|
660 | A file survives if at least |
---|
661 | \begin_inset Formula $k$ |
---|
662 | \end_inset |
---|
663 | |
---|
664 | of the |
---|
665 | \begin_inset Formula $N$ |
---|
666 | \end_inset |
---|
667 | |
---|
668 | shares survive. |
---|
669 | Equation |
---|
670 | \begin_inset CommandInset ref |
---|
671 | LatexCommand ref |
---|
672 | reference "eq:binomial-pmf" |
---|
673 | |
---|
674 | \end_inset |
---|
675 | |
---|
676 | gives the probability that exactly |
---|
677 | \begin_inset Formula $i$ |
---|
678 | \end_inset |
---|
679 | |
---|
680 | shares survive, for any |
---|
681 | \begin_inset Formula $1\leq i\leq n$ |
---|
682 | \end_inset |
---|
683 | |
---|
684 | , so the probability that fewer than |
---|
685 | \begin_inset Formula $k$ |
---|
686 | \end_inset |
---|
687 | |
---|
688 | survive is the sum of the probabilities that |
---|
689 | \begin_inset Formula $0,1,2,\ldots,k-1$ |
---|
690 | \end_inset |
---|
691 | |
---|
692 | shares survive. |
---|
693 | That is: |
---|
694 | \end_layout |
---|
695 | |
---|
696 | \begin_layout Standard |
---|
697 | \begin_inset Formula \begin{equation} |
---|
698 | Pr[file\, lost]=\sum_{i=0}^{k-1}\binom{n}{i}p^{i}(1-p)^{n-i}\label{eq:simple-failure}\end{equation} |
---|
699 | |
---|
700 | \end_inset |
---|
701 | |
---|
702 | |
---|
703 | \end_layout |
---|
704 | |
---|
705 | \begin_layout Subsection |
---|
706 | Independent Reliability |
---|
707 | \begin_inset CommandInset label |
---|
708 | LatexCommand label |
---|
709 | name "sub:Independent-Reliability" |
---|
710 | |
---|
711 | \end_inset |
---|
712 | |
---|
713 | |
---|
714 | \end_layout |
---|
715 | |
---|
716 | \begin_layout Standard |
---|
717 | Equation |
---|
718 | \begin_inset CommandInset ref |
---|
719 | LatexCommand ref |
---|
720 | reference "eq:simple-failure" |
---|
721 | |
---|
722 | \end_inset |
---|
723 | |
---|
724 | assumes that all shares have the same probability of survival, but as explained |
---|
725 | above, this is not necessarily true. |
---|
726 | A more accurate model allows each share |
---|
727 | \begin_inset Formula $s_{i}$ |
---|
728 | \end_inset |
---|
729 | |
---|
730 | an independent probability of survival |
---|
731 | \begin_inset Formula $p_{i}$ |
---|
732 | \end_inset |
---|
733 | |
---|
734 | . |
---|
735 | Each share's survival can still be treated as an independent Bernoulli |
---|
736 | trial, but with success probability |
---|
737 | \begin_inset Formula $p_{i}$ |
---|
738 | \end_inset |
---|
739 | |
---|
740 | . |
---|
741 | Under this assumption, |
---|
742 | \begin_inset Formula $K$ |
---|
743 | \end_inset |
---|
744 | |
---|
745 | follows a generalized binomial distribution with parameters |
---|
746 | \begin_inset Formula $N$ |
---|
747 | \end_inset |
---|
748 | |
---|
749 | and |
---|
750 | \begin_inset Formula $p_{1},p_{2},\dots,p_{N}$ |
---|
751 | \end_inset |
---|
752 | |
---|
753 | . |
---|
754 | \end_layout |
---|
755 | |
---|
756 | \begin_layout Standard |
---|
757 | The PMF for this generalized |
---|
758 | \begin_inset Formula $K$ |
---|
759 | \end_inset |
---|
760 | |
---|
761 | does not have a simple closed-form representation. |
---|
762 | However, the PMFs for random variables representing individual share survival |
---|
763 | do. |
---|
764 | Let |
---|
765 | \begin_inset Formula $K_{i}$ |
---|
766 | \end_inset |
---|
767 | |
---|
768 | be a random variable such that: |
---|
769 | \end_layout |
---|
770 | |
---|
771 | \begin_layout Standard |
---|
772 | \begin_inset Formula \[ |
---|
773 | K_{i}=\begin{cases} |
---|
774 | 1 & \textnormal{if }s_{i}\textnormal{ survives}\\ |
---|
775 | 0 & \textnormal{if }s_{i}\textnormal{ fails}\end{cases}\] |
---|
776 | |
---|
777 | \end_inset |
---|
778 | |
---|
779 | |
---|
780 | \end_layout |
---|
781 | |
---|
782 | \begin_layout Standard |
---|
783 | The PMF for |
---|
784 | \begin_inset Formula $K_{i}$ |
---|
785 | \end_inset |
---|
786 | |
---|
787 | is very simple: |
---|
788 | \begin_inset Formula \[ |
---|
789 | Pr[K_{i}=j]=\begin{cases} |
---|
790 | p_{i} & j=1\\ |
---|
791 | 1-p_{i} & j=0\end{cases}\] |
---|
792 | |
---|
793 | \end_inset |
---|
794 | |
---|
795 | which can also be expressed as |
---|
796 | \begin_inset Formula \[ |
---|
797 | Pr[K_{i}=j]=f\left(j\right)=\left(1-p_{i}\right)\left(1-j\right)+p_{i}\left(j\right)\] |
---|
798 | |
---|
799 | \end_inset |
---|
800 | |
---|
801 | |
---|
802 | \end_layout |
---|
803 | |
---|
804 | \begin_layout Standard |
---|
805 | Note that since each |
---|
806 | \begin_inset Formula $K_{i}$ |
---|
807 | \end_inset |
---|
808 | |
---|
809 | represents the count of shares |
---|
810 | \begin_inset Formula $s_{i}$ |
---|
811 | \end_inset |
---|
812 | |
---|
813 | that survives (either 0 or 1), if we add up all of the individual survivor |
---|
814 | counts, we get the group survivor count. |
---|
815 | That is: |
---|
816 | \begin_inset Formula \[ |
---|
817 | \sum_{i=1}^{N}K_{i}=K\] |
---|
818 | |
---|
819 | \end_inset |
---|
820 | |
---|
821 | Effectively, we have separated |
---|
822 | \begin_inset Formula $K$ |
---|
823 | \end_inset |
---|
824 | |
---|
825 | into the series of Bernoulli trials that make it up. |
---|
826 | \end_layout |
---|
827 | |
---|
828 | \begin_layout Theorem |
---|
829 | Discrete Convolution Theorem |
---|
830 | \end_layout |
---|
831 | |
---|
832 | \begin_layout Theorem |
---|
833 | Let |
---|
834 | \begin_inset Formula $X$ |
---|
835 | \end_inset |
---|
836 | |
---|
837 | and |
---|
838 | \begin_inset Formula $Y$ |
---|
839 | \end_inset |
---|
840 | |
---|
841 | be discrete random variables with probability mass functions given by |
---|
842 | \begin_inset Formula $Pr\left[X=x\right]=f(x)$ |
---|
843 | \end_inset |
---|
844 | |
---|
845 | and |
---|
846 | \begin_inset Formula $Pr\left[Y=y\right]=g(y).$ |
---|
847 | \end_inset |
---|
848 | |
---|
849 | Let |
---|
850 | \begin_inset Formula $Z$ |
---|
851 | \end_inset |
---|
852 | |
---|
853 | be the discrete random random variable obtained by summing |
---|
854 | \begin_inset Formula $X$ |
---|
855 | \end_inset |
---|
856 | |
---|
857 | and |
---|
858 | \begin_inset Formula $Y$ |
---|
859 | \end_inset |
---|
860 | |
---|
861 | . |
---|
862 | \end_layout |
---|
863 | |
---|
864 | \begin_layout Theorem |
---|
865 | The probability mass function of |
---|
866 | \begin_inset Formula $Z$ |
---|
867 | \end_inset |
---|
868 | |
---|
869 | is given by |
---|
870 | \begin_inset Formula \[ |
---|
871 | Pr[Z=z]=h(z)=\left(f\star g\right)(z)\] |
---|
872 | |
---|
873 | \end_inset |
---|
874 | |
---|
875 | where |
---|
876 | \begin_inset Formula $\star$ |
---|
877 | \end_inset |
---|
878 | |
---|
879 | denotes the discrete convolution operation: |
---|
880 | \begin_inset Formula \[ |
---|
881 | \left(f\star g\right)\left(n\right)=\sum_{m=-\infty}^{\infty}f\left(m\right)g\left(m-n\right)\] |
---|
882 | |
---|
883 | \end_inset |
---|
884 | |
---|
885 | |
---|
886 | \end_layout |
---|
887 | |
---|
888 | \begin_layout Proof |
---|
889 | The proof is beyond the scope of this paper. |
---|
890 | \end_layout |
---|
891 | |
---|
892 | \begin_layout Standard |
---|
893 | If we denote the PMF of |
---|
894 | \begin_inset Formula $K$ |
---|
895 | \end_inset |
---|
896 | |
---|
897 | with |
---|
898 | \begin_inset Formula $f$ |
---|
899 | \end_inset |
---|
900 | |
---|
901 | and the PMF of |
---|
902 | \begin_inset Formula $K_{i}$ |
---|
903 | \end_inset |
---|
904 | |
---|
905 | with |
---|
906 | \begin_inset Formula $g_{i}$ |
---|
907 | \end_inset |
---|
908 | |
---|
909 | (more formally, |
---|
910 | \begin_inset Formula $Pr[K=x]=f(x)$ |
---|
911 | \end_inset |
---|
912 | |
---|
913 | and |
---|
914 | \begin_inset Formula $Pr[K_{i}=x]=g_{i}(x)$ |
---|
915 | \end_inset |
---|
916 | |
---|
917 | ) then since |
---|
918 | \begin_inset Formula $K=\sum_{i=1}^{N}K_{i}$ |
---|
919 | \end_inset |
---|
920 | |
---|
921 | , according to the discrete convolution theorem |
---|
922 | \begin_inset Formula $f=g_{1}\star g_{2}\star g_{3}\star\ldots\star g_{N}$ |
---|
923 | \end_inset |
---|
924 | |
---|
925 | . |
---|
926 | Since convolution is associative, this can also be written as |
---|
927 | \begin_inset Formula $ $ |
---|
928 | \end_inset |
---|
929 | |
---|
930 | |
---|
931 | \begin_inset Formula \begin{equation} |
---|
932 | f=(\ldots((g_{1}\star g_{2})\star g_{3})\star\ldots)\star g_{N})\label{eq:convolution}\end{equation} |
---|
933 | |
---|
934 | \end_inset |
---|
935 | |
---|
936 | Therefore, |
---|
937 | \begin_inset Formula $f$ |
---|
938 | \end_inset |
---|
939 | |
---|
940 | can be computed as a sequence of convolution operations on the simple PMFs |
---|
941 | of the random variables |
---|
942 | \begin_inset Formula $K_{i}$ |
---|
943 | \end_inset |
---|
944 | |
---|
945 | . |
---|
946 | In fact, for large |
---|
947 | \begin_inset Formula $N$ |
---|
948 | \end_inset |
---|
949 | |
---|
950 | , equation |
---|
951 | \begin_inset CommandInset ref |
---|
952 | LatexCommand ref |
---|
953 | reference "eq:convolution" |
---|
954 | |
---|
955 | \end_inset |
---|
956 | |
---|
957 | turns out to be a more effective means of computing the PMF of |
---|
958 | \begin_inset Formula $K$ |
---|
959 | \end_inset |
---|
960 | |
---|
961 | than the binomial theorem. |
---|
962 | even in the case of shares with identical survival probability. |
---|
963 | The reason it's better is because the calculation of |
---|
964 | \begin_inset Formula $\binom{n}{m}$ |
---|
965 | \end_inset |
---|
966 | |
---|
967 | in equation |
---|
968 | \begin_inset CommandInset ref |
---|
969 | LatexCommand ref |
---|
970 | reference "eq:binomial-pmf" |
---|
971 | |
---|
972 | \end_inset |
---|
973 | |
---|
974 | produces very large values that overflow unless arbitrary precision numeric |
---|
975 | representations are used. |
---|
976 | \end_layout |
---|
977 | |
---|
978 | \begin_layout Standard |
---|
979 | Note also that it is not necessary to have very simple PMFs like those of |
---|
980 | the |
---|
981 | \begin_inset Formula $K_{i}$ |
---|
982 | \end_inset |
---|
983 | |
---|
984 | . |
---|
985 | Any share or set of shares that has a known PMF can be combined with any |
---|
986 | other set with a known PMF by convolution, as long as the two share sets |
---|
987 | are independent. |
---|
988 | The reverse holds as well; given a group with an empirically-derived PMF, |
---|
989 | in it's theoretically possible to solve for an individual PMF, and thereby |
---|
990 | determine |
---|
991 | \begin_inset Formula $p_{i}$ |
---|
992 | \end_inset |
---|
993 | |
---|
994 | even when per-share data is unavailable. |
---|
995 | \end_layout |
---|
996 | |
---|
997 | \begin_layout Subsection |
---|
998 | Multiple Failure Modes |
---|
999 | \begin_inset CommandInset label |
---|
1000 | LatexCommand label |
---|
1001 | name "sub:Multiple-Failure-Modes" |
---|
1002 | |
---|
1003 | \end_inset |
---|
1004 | |
---|
1005 | |
---|
1006 | \end_layout |
---|
1007 | |
---|
1008 | \begin_layout Standard |
---|
1009 | In modeling share survival probabilities, it's useful to be able to analyze |
---|
1010 | separately each of the various failure modes. |
---|
1011 | For example, if reliable statistics for disk failure can be obtained, then |
---|
1012 | a probability mass function for that form of failure can be generated. |
---|
1013 | Similarly, statistics on other hardware failures, administrative errors, |
---|
1014 | network losses, etc., can all be estimated independently. |
---|
1015 | If those estimates can then be combined into a single PMF for a share, |
---|
1016 | then we can use it to predict failures for that share. |
---|
1017 | \end_layout |
---|
1018 | |
---|
1019 | \begin_layout Standard |
---|
1020 | Combining independent failure modes for a single share is straightforward. |
---|
1021 | If |
---|
1022 | \begin_inset Formula $p_{i,j}$ |
---|
1023 | \end_inset |
---|
1024 | |
---|
1025 | is the probability of survival of the |
---|
1026 | \begin_inset Formula $j$ |
---|
1027 | \end_inset |
---|
1028 | |
---|
1029 | th failure mode of share |
---|
1030 | \begin_inset Formula $i$ |
---|
1031 | \end_inset |
---|
1032 | |
---|
1033 | , |
---|
1034 | \begin_inset Formula $1\leq j\leq m$ |
---|
1035 | \end_inset |
---|
1036 | |
---|
1037 | , then |
---|
1038 | \begin_inset Formula \[ |
---|
1039 | Pr[K_{i}=k]=f_{i}(k)=\begin{cases} |
---|
1040 | \prod_{j=1}^{m}p_{i,j} & k=1\\ |
---|
1041 | 1-\prod_{j=1}^{m}p_{i,j} & k=0\end{cases}\] |
---|
1042 | |
---|
1043 | \end_inset |
---|
1044 | |
---|
1045 | is the survival PMF. |
---|
1046 | \end_layout |
---|
1047 | |
---|
1048 | \begin_layout Subsection |
---|
1049 | Multi-share failures |
---|
1050 | \begin_inset CommandInset label |
---|
1051 | LatexCommand label |
---|
1052 | name "sub:Multi-share-failures" |
---|
1053 | |
---|
1054 | \end_inset |
---|
1055 | |
---|
1056 | |
---|
1057 | \end_layout |
---|
1058 | |
---|
1059 | \begin_layout Standard |
---|
1060 | If there are failure modes that affect multiple computers, we can also construct |
---|
1061 | the PMF that predicts their survival. |
---|
1062 | The key observation is that the PMF has non-zero probabilities only for |
---|
1063 | |
---|
1064 | \begin_inset Formula $0$ |
---|
1065 | \end_inset |
---|
1066 | |
---|
1067 | survivors and |
---|
1068 | \begin_inset Formula $n$ |
---|
1069 | \end_inset |
---|
1070 | |
---|
1071 | survivors, where |
---|
1072 | \begin_inset Formula $n$ |
---|
1073 | \end_inset |
---|
1074 | |
---|
1075 | is the number of shares in the set. |
---|
1076 | If |
---|
1077 | \begin_inset Formula $p$ |
---|
1078 | \end_inset |
---|
1079 | |
---|
1080 | is the probability of survival, the PMF of |
---|
1081 | \begin_inset Formula $K$ |
---|
1082 | \end_inset |
---|
1083 | |
---|
1084 | , a random variable representing the number of survivors is |
---|
1085 | \begin_inset Formula \[ |
---|
1086 | Pr[K=k]=f(k)=\begin{cases} |
---|
1087 | p & k=n\\ |
---|
1088 | 0 & 0<i<n\\ |
---|
1089 | 1-p & k=0\end{cases}\] |
---|
1090 | |
---|
1091 | \end_inset |
---|
1092 | |
---|
1093 | |
---|
1094 | \end_layout |
---|
1095 | |
---|
1096 | \begin_layout Standard |
---|
1097 | Group failures due to multiple independent causes can be combined as in |
---|
1098 | section |
---|
1099 | \begin_inset CommandInset ref |
---|
1100 | LatexCommand ref |
---|
1101 | reference "sub:Multiple-Failure-Modes" |
---|
1102 | |
---|
1103 | \end_inset |
---|
1104 | |
---|
1105 | , as long as they apply to the whole group. |
---|
1106 | \end_layout |
---|
1107 | |
---|
1108 | \begin_layout Example |
---|
1109 | Putting the Pieces Together |
---|
1110 | \end_layout |
---|
1111 | |
---|
1112 | \begin_layout Standard |
---|
1113 | Sections |
---|
1114 | \begin_inset CommandInset ref |
---|
1115 | LatexCommand ref |
---|
1116 | reference "sub:Fixed-Reliability" |
---|
1117 | |
---|
1118 | \end_inset |
---|
1119 | |
---|
1120 | through |
---|
1121 | \begin_inset CommandInset ref |
---|
1122 | LatexCommand ref |
---|
1123 | reference "sub:Multi-share-failures" |
---|
1124 | |
---|
1125 | \end_inset |
---|
1126 | |
---|
1127 | provide ways of calculating the survival probability mass functions for |
---|
1128 | a variety of share failure structures and modes. |
---|
1129 | As an example of how these pieces can be used, consider a network with |
---|
1130 | the following peers: |
---|
1131 | \end_layout |
---|
1132 | |
---|
1133 | \begin_layout Itemize |
---|
1134 | Four servers located in a data center in Nebraska. |
---|
1135 | The machines have multiply-redundant Internet connections, with a failure |
---|
1136 | probability of 0.0001. |
---|
1137 | They store their shares on RAID arrays with failure probability of 0.0002. |
---|
1138 | The administrative staff makes data-destroying errors with probability |
---|
1139 | 0.003. |
---|
1140 | \end_layout |
---|
1141 | |
---|
1142 | \begin_layout Itemize |
---|
1143 | Four servers located in a data center on the island of Hawaii. |
---|
1144 | These servers have identical failure probabilities as the servers in Nebraska, |
---|
1145 | except that the data center is near the edge of the crater on Mount Kilauea |
---|
1146 | (nobody said examples had to be realistic). |
---|
1147 | There is a 0.04 chance that the volcano will erupt and bury the data center |
---|
1148 | in molten lava, destroying it entirely. |
---|
1149 | \end_layout |
---|
1150 | |
---|
1151 | \begin_layout Itemize |
---|
1152 | Four PCs located in random homes, connected to the Internet via assorted |
---|
1153 | cable modems and DSL. |
---|
1154 | Their network connections fail with probability 0.009. |
---|
1155 | Their disks fail with probability 0.001. |
---|
1156 | Their users destroy data with probability 0.05. |
---|
1157 | \end_layout |
---|
1158 | |
---|
1159 | \begin_layout Standard |
---|
1160 | If one share is placed on each of these 12 computers, what's the probability |
---|
1161 | mass function of share survival? To more compactly describe PMFs, we'll |
---|
1162 | denote them as probability vectors of the form |
---|
1163 | \begin_inset Formula $\left[\alpha_{o},\alpha_{1},\alpha_{2},\ldots\alpha_{n}\right]$ |
---|
1164 | \end_inset |
---|
1165 | |
---|
1166 | where |
---|
1167 | \begin_inset Formula $\alpha_{i}$ |
---|
1168 | \end_inset |
---|
1169 | |
---|
1170 | is the probability that exactly |
---|
1171 | \begin_inset Formula $i$ |
---|
1172 | \end_inset |
---|
1173 | |
---|
1174 | shares survive. |
---|
1175 | \end_layout |
---|
1176 | |
---|
1177 | \begin_layout Standard |
---|
1178 | The servers in the two data centers have individual failure probabilities |
---|
1179 | of RAID failure (.0002) and administrative error (.003) giving an individual |
---|
1180 | survival probability of |
---|
1181 | \begin_inset Formula \[ |
---|
1182 | (1-.0002)\cdot(1-.003)=.9998\cdot.997=.9968\] |
---|
1183 | |
---|
1184 | \end_inset |
---|
1185 | |
---|
1186 | |
---|
1187 | \end_layout |
---|
1188 | |
---|
1189 | \begin_layout Standard |
---|
1190 | Using |
---|
1191 | \begin_inset Formula $p=.9968,n=4$ |
---|
1192 | \end_inset |
---|
1193 | |
---|
1194 | in equation |
---|
1195 | \begin_inset CommandInset ref |
---|
1196 | LatexCommand ref |
---|
1197 | reference "eq:binomial-pmf" |
---|
1198 | |
---|
1199 | \end_inset |
---|
1200 | |
---|
1201 | gives the survival PMF |
---|
1202 | \begin_inset Formula \[ |
---|
1203 | \left[1.049\times10^{-10},1.307\times10^{-7},6.105\times10^{-5},0.01271,0.9872\right]\] |
---|
1204 | |
---|
1205 | \end_inset |
---|
1206 | |
---|
1207 | which applies to each group of four servers. |
---|
1208 | However, each data center also has a .0001 chance of data connection loss, |
---|
1209 | which affects all four servers at once, and Hawaii has the additional .04 |
---|
1210 | probability of severe lava burn. |
---|
1211 | If the network fails at a location, all the machines go offline together. |
---|
1212 | The probability that 0 machines survive is the probability that they all |
---|
1213 | fail for individual reasons ( |
---|
1214 | \begin_inset Formula $1.049\cdot10^{-10}$ |
---|
1215 | \end_inset |
---|
1216 | |
---|
1217 | ) plus the probability they all fail because of a network outage ( |
---|
1218 | \begin_inset Formula $.0001$ |
---|
1219 | \end_inset |
---|
1220 | |
---|
1221 | ) less the probability they fail for both reasons: |
---|
1222 | \begin_inset Formula \[ |
---|
1223 | \left(1.049\times10^{-10}\right)+\left(0.0001\right)-\left[\left(1.049\times10^{-10}\right)\cdot\left(0.0001\right)\right]\approxeq0.0001\] |
---|
1224 | |
---|
1225 | \end_inset |
---|
1226 | |
---|
1227 | |
---|
1228 | \end_layout |
---|
1229 | |
---|
1230 | \begin_layout Standard |
---|
1231 | That's the |
---|
1232 | \begin_inset Formula $i=0$ |
---|
1233 | \end_inset |
---|
1234 | |
---|
1235 | element of the combined PMF. |
---|
1236 | The combined probability of survival of |
---|
1237 | \begin_inset Formula $0<i\leq4$ |
---|
1238 | \end_inset |
---|
1239 | |
---|
1240 | servers is simpler: it's the probability they survive individual failure, |
---|
1241 | from the individual failure PMF above, times the probability they survive |
---|
1242 | network failure (.9999). |
---|
1243 | So the combined survival PMF, which we'll denote as |
---|
1244 | \begin_inset Formula $n(i)$ |
---|
1245 | \end_inset |
---|
1246 | |
---|
1247 | of the Nebraska servers is |
---|
1248 | \begin_inset Formula \[ |
---|
1249 | n(i)=\left[0.0001,1.306\times10^{-7},6.104\times10^{-5},0.01268,0.9872\right]\] |
---|
1250 | |
---|
1251 | \end_inset |
---|
1252 | |
---|
1253 | which has the interesting property that complete failure is 1000 times more |
---|
1254 | likely than survival of one server. |
---|
1255 | This is because the probability of a network outage is so much greater |
---|
1256 | than simultaneous |
---|
1257 | \begin_inset Foot |
---|
1258 | status collapsed |
---|
1259 | |
---|
1260 | \begin_layout Plain Layout |
---|
1261 | Of course, the failures need not be truly simultaneous, they just have happen |
---|
1262 | in the same interval between repair runs. |
---|
1263 | \end_layout |
---|
1264 | |
---|
1265 | \end_inset |
---|
1266 | |
---|
1267 | independent failure of three servers. |
---|
1268 | \end_layout |
---|
1269 | |
---|
1270 | \begin_layout Standard |
---|
1271 | We apply the same process for the Hawaii servers, but with group survival |
---|
1272 | probability of |
---|
1273 | \begin_inset Formula $(1-.0001)(1-.04)=.9799$ |
---|
1274 | \end_inset |
---|
1275 | |
---|
1276 | gives the survival PMF |
---|
1277 | \begin_inset Formula \[ |
---|
1278 | h(i)=\left[0.0201,1.280\times10^{-7},5.982\times10^{-5},0.01242,0.9674\right]\] |
---|
1279 | |
---|
1280 | \end_inset |
---|
1281 | |
---|
1282 | |
---|
1283 | \end_layout |
---|
1284 | |
---|
1285 | \begin_layout Standard |
---|
1286 | Applying the convolution operator to |
---|
1287 | \begin_inset Formula $n(i)$ |
---|
1288 | \end_inset |
---|
1289 | |
---|
1290 | and |
---|
1291 | \begin_inset Formula $h(i)$ |
---|
1292 | \end_inset |
---|
1293 | |
---|
1294 | , the survival PMF of all eight servers is: |
---|
1295 | \end_layout |
---|
1296 | |
---|
1297 | \begin_layout Standard |
---|
1298 | \begin_inset Formula \[ |
---|
1299 | \left(n\star h\right)\left(i\right)=\begin{cases} |
---|
1300 | 2.010\times10^{-6} & i=0\\ |
---|
1301 | 2.639\times10^{-9} & i=1\\ |
---|
1302 | 1.233\times10^{-6} & i=2\\ |
---|
1303 | 2.560\times10^{-4} & i=3\\ |
---|
1304 | 0.01994 & i=4\\ |
---|
1305 | 1.769\times10^{-6} & i=5\\ |
---|
1306 | 2.756\times10^{-4} & i=6\\ |
---|
1307 | 0.02452 & i=7\\ |
---|
1308 | 0.9559 & i=8\end{cases}\] |
---|
1309 | |
---|
1310 | \end_inset |
---|
1311 | |
---|
1312 | |
---|
1313 | \end_layout |
---|
1314 | |
---|
1315 | \begin_layout Standard |
---|
1316 | \begin_inset VSpace defskip |
---|
1317 | \end_inset |
---|
1318 | |
---|
1319 | |
---|
1320 | \end_layout |
---|
1321 | |
---|
1322 | \begin_layout Standard |
---|
1323 | Note that losing four shares ( |
---|
1324 | \begin_inset Formula $i=4$ |
---|
1325 | \end_inset |
---|
1326 | |
---|
1327 | ) is 10,000 times more likely than losing three ( |
---|
1328 | \begin_inset Formula $i=5$ |
---|
1329 | \end_inset |
---|
1330 | |
---|
1331 | ). |
---|
1332 | This is because both data centers have a whole-center failure mode, and |
---|
1333 | the Hawaii center's lava burn probability is so high. |
---|
1334 | Similarly, the probability of losing all of them is 1000 times higher than |
---|
1335 | the probability of losing all but one. |
---|
1336 | \end_layout |
---|
1337 | |
---|
1338 | \begin_layout Standard |
---|
1339 | For the home PCs, their individual probability of survival is |
---|
1340 | \begin_inset Formula \[ |
---|
1341 | (1-.009)\cdot(1-.001)\cdot(1-.05)=.991\cdot.999\cdot.95=.9405\] |
---|
1342 | |
---|
1343 | \end_inset |
---|
1344 | |
---|
1345 | |
---|
1346 | \end_layout |
---|
1347 | |
---|
1348 | \begin_layout Standard |
---|
1349 | We can then apply equation |
---|
1350 | \begin_inset CommandInset ref |
---|
1351 | LatexCommand ref |
---|
1352 | reference "eq:binomial-pmf" |
---|
1353 | |
---|
1354 | \end_inset |
---|
1355 | |
---|
1356 | with |
---|
1357 | \begin_inset Formula $N=4$ |
---|
1358 | \end_inset |
---|
1359 | |
---|
1360 | and |
---|
1361 | \begin_inset Formula $p=.9405$ |
---|
1362 | \end_inset |
---|
1363 | |
---|
1364 | to compute the PMF |
---|
1365 | \begin_inset Formula $g(i),0\leq i\leq4$ |
---|
1366 | \end_inset |
---|
1367 | |
---|
1368 | for the PCs and finally compute |
---|
1369 | \begin_inset Formula $f(i)=\left(g\star\left(n\star h\right)\right)\left(i\right)$ |
---|
1370 | \end_inset |
---|
1371 | |
---|
1372 | , the PMF of the whole share set. |
---|
1373 | Summing the values of |
---|
1374 | \begin_inset Formula $f(i)$ |
---|
1375 | \end_inset |
---|
1376 | |
---|
1377 | for |
---|
1378 | \begin_inset Formula $0\leq i\leq k-1$ |
---|
1379 | \end_inset |
---|
1380 | |
---|
1381 | gives the probability that less than |
---|
1382 | \begin_inset Formula $k$ |
---|
1383 | \end_inset |
---|
1384 | |
---|
1385 | shares survive and the file is unrecoverable. |
---|
1386 | For this example, those sums are shown in table |
---|
1387 | \begin_inset CommandInset ref |
---|
1388 | LatexCommand vref |
---|
1389 | reference "tab:Example-PMF" |
---|
1390 | |
---|
1391 | \end_inset |
---|
1392 | |
---|
1393 | . |
---|
1394 | \begin_inset Float table |
---|
1395 | wide false |
---|
1396 | sideways false |
---|
1397 | status collapsed |
---|
1398 | |
---|
1399 | \begin_layout Plain Layout |
---|
1400 | \align center |
---|
1401 | \begin_inset Tabular |
---|
1402 | <lyxtabular version="3" rows="13" columns="4"> |
---|
1403 | <features> |
---|
1404 | <column alignment="center" valignment="top" width="0"> |
---|
1405 | <column alignment="center" valignment="top" width="0"> |
---|
1406 | <column alignment="center" valignment="top" width="0"> |
---|
1407 | <column alignment="center" valignment="top" width="0"> |
---|
1408 | <row> |
---|
1409 | <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none"> |
---|
1410 | \begin_inset Text |
---|
1411 | |
---|
1412 | \begin_layout Plain Layout |
---|
1413 | \begin_inset Formula $k$ |
---|
1414 | \end_inset |
---|
1415 | |
---|
1416 | |
---|
1417 | \end_layout |
---|
1418 | |
---|
1419 | \end_inset |
---|
1420 | </cell> |
---|
1421 | <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none"> |
---|
1422 | \begin_inset Text |
---|
1423 | |
---|
1424 | \begin_layout Plain Layout |
---|
1425 | \begin_inset Formula $Pr[K=k]$ |
---|
1426 | \end_inset |
---|
1427 | |
---|
1428 | |
---|
1429 | \end_layout |
---|
1430 | |
---|
1431 | \end_inset |
---|
1432 | </cell> |
---|
1433 | <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none"> |
---|
1434 | \begin_inset Text |
---|
1435 | |
---|
1436 | \begin_layout Plain Layout |
---|
1437 | \begin_inset Formula $Pr[file\, loss]=Pr[K<k]$ |
---|
1438 | \end_inset |
---|
1439 | |
---|
1440 | |
---|
1441 | \end_layout |
---|
1442 | |
---|
1443 | \end_inset |
---|
1444 | </cell> |
---|
1445 | <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none"> |
---|
1446 | \begin_inset Text |
---|
1447 | |
---|
1448 | \begin_layout Plain Layout |
---|
1449 | \begin_inset Formula $N/k$ |
---|
1450 | \end_inset |
---|
1451 | |
---|
1452 | |
---|
1453 | \end_layout |
---|
1454 | |
---|
1455 | \end_inset |
---|
1456 | </cell> |
---|
1457 | </row> |
---|
1458 | <row> |
---|
1459 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1460 | \begin_inset Text |
---|
1461 | |
---|
1462 | \begin_layout Plain Layout |
---|
1463 | 1 |
---|
1464 | \end_layout |
---|
1465 | |
---|
1466 | \end_inset |
---|
1467 | </cell> |
---|
1468 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1469 | \begin_inset Text |
---|
1470 | |
---|
1471 | \begin_layout Plain Layout |
---|
1472 | \begin_inset Formula $1.60\times10^{-9}$ |
---|
1473 | \end_inset |
---|
1474 | |
---|
1475 | |
---|
1476 | \end_layout |
---|
1477 | |
---|
1478 | \end_inset |
---|
1479 | </cell> |
---|
1480 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1481 | \begin_inset Text |
---|
1482 | |
---|
1483 | \begin_layout Plain Layout |
---|
1484 | \begin_inset Formula $2.53\times10^{-11}$ |
---|
1485 | \end_inset |
---|
1486 | |
---|
1487 | |
---|
1488 | \end_layout |
---|
1489 | |
---|
1490 | \end_inset |
---|
1491 | </cell> |
---|
1492 | <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none"> |
---|
1493 | \begin_inset Text |
---|
1494 | |
---|
1495 | \begin_layout Plain Layout |
---|
1496 | 12 |
---|
1497 | \end_layout |
---|
1498 | |
---|
1499 | \end_inset |
---|
1500 | </cell> |
---|
1501 | </row> |
---|
1502 | <row> |
---|
1503 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1504 | \begin_inset Text |
---|
1505 | |
---|
1506 | \begin_layout Plain Layout |
---|
1507 | 2 |
---|
1508 | \end_layout |
---|
1509 | |
---|
1510 | \end_inset |
---|
1511 | </cell> |
---|
1512 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1513 | \begin_inset Text |
---|
1514 | |
---|
1515 | \begin_layout Plain Layout |
---|
1516 | \begin_inset Formula $3.80\times10^{-8}$ |
---|
1517 | \end_inset |
---|
1518 | |
---|
1519 | |
---|
1520 | \end_layout |
---|
1521 | |
---|
1522 | \end_inset |
---|
1523 | </cell> |
---|
1524 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1525 | \begin_inset Text |
---|
1526 | |
---|
1527 | \begin_layout Plain Layout |
---|
1528 | \begin_inset Formula $1.63\times10^{-9}$ |
---|
1529 | \end_inset |
---|
1530 | |
---|
1531 | |
---|
1532 | \end_layout |
---|
1533 | |
---|
1534 | \end_inset |
---|
1535 | </cell> |
---|
1536 | <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none"> |
---|
1537 | \begin_inset Text |
---|
1538 | |
---|
1539 | \begin_layout Plain Layout |
---|
1540 | 6 |
---|
1541 | \end_layout |
---|
1542 | |
---|
1543 | \end_inset |
---|
1544 | </cell> |
---|
1545 | </row> |
---|
1546 | <row> |
---|
1547 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1548 | \begin_inset Text |
---|
1549 | |
---|
1550 | \begin_layout Plain Layout |
---|
1551 | 3 |
---|
1552 | \end_layout |
---|
1553 | |
---|
1554 | \end_inset |
---|
1555 | </cell> |
---|
1556 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1557 | \begin_inset Text |
---|
1558 | |
---|
1559 | \begin_layout Plain Layout |
---|
1560 | \begin_inset Formula $4.04\times10^{-7}$ |
---|
1561 | \end_inset |
---|
1562 | |
---|
1563 | |
---|
1564 | \end_layout |
---|
1565 | |
---|
1566 | \end_inset |
---|
1567 | </cell> |
---|
1568 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1569 | \begin_inset Text |
---|
1570 | |
---|
1571 | \begin_layout Plain Layout |
---|
1572 | \begin_inset Formula $3.70\times10^{-8}$ |
---|
1573 | \end_inset |
---|
1574 | |
---|
1575 | |
---|
1576 | \end_layout |
---|
1577 | |
---|
1578 | \end_inset |
---|
1579 | </cell> |
---|
1580 | <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none"> |
---|
1581 | \begin_inset Text |
---|
1582 | |
---|
1583 | \begin_layout Plain Layout |
---|
1584 | 4 |
---|
1585 | \end_layout |
---|
1586 | |
---|
1587 | \end_inset |
---|
1588 | </cell> |
---|
1589 | </row> |
---|
1590 | <row> |
---|
1591 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1592 | \begin_inset Text |
---|
1593 | |
---|
1594 | \begin_layout Plain Layout |
---|
1595 | 4 |
---|
1596 | \end_layout |
---|
1597 | |
---|
1598 | \end_inset |
---|
1599 | </cell> |
---|
1600 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1601 | \begin_inset Text |
---|
1602 | |
---|
1603 | \begin_layout Plain Layout |
---|
1604 | \begin_inset Formula $2.06\times10^{-6}$ |
---|
1605 | \end_inset |
---|
1606 | |
---|
1607 | |
---|
1608 | \end_layout |
---|
1609 | |
---|
1610 | \end_inset |
---|
1611 | </cell> |
---|
1612 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1613 | \begin_inset Text |
---|
1614 | |
---|
1615 | \begin_layout Plain Layout |
---|
1616 | \begin_inset Formula $4.44\times10^{-7}$ |
---|
1617 | \end_inset |
---|
1618 | |
---|
1619 | |
---|
1620 | \end_layout |
---|
1621 | |
---|
1622 | \end_inset |
---|
1623 | </cell> |
---|
1624 | <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none"> |
---|
1625 | \begin_inset Text |
---|
1626 | |
---|
1627 | \begin_layout Plain Layout |
---|
1628 | 3 |
---|
1629 | \end_layout |
---|
1630 | |
---|
1631 | \end_inset |
---|
1632 | </cell> |
---|
1633 | </row> |
---|
1634 | <row> |
---|
1635 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1636 | \begin_inset Text |
---|
1637 | |
---|
1638 | \begin_layout Plain Layout |
---|
1639 | 5 |
---|
1640 | \end_layout |
---|
1641 | |
---|
1642 | \end_inset |
---|
1643 | </cell> |
---|
1644 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1645 | \begin_inset Text |
---|
1646 | |
---|
1647 | \begin_layout Plain Layout |
---|
1648 | \begin_inset Formula $2.10\times10^{-5}$ |
---|
1649 | \end_inset |
---|
1650 | |
---|
1651 | |
---|
1652 | \end_layout |
---|
1653 | |
---|
1654 | \end_inset |
---|
1655 | </cell> |
---|
1656 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1657 | \begin_inset Text |
---|
1658 | |
---|
1659 | \begin_layout Plain Layout |
---|
1660 | \begin_inset Formula $2.50\times10^{-6}$ |
---|
1661 | \end_inset |
---|
1662 | |
---|
1663 | |
---|
1664 | \end_layout |
---|
1665 | |
---|
1666 | \end_inset |
---|
1667 | </cell> |
---|
1668 | <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none"> |
---|
1669 | \begin_inset Text |
---|
1670 | |
---|
1671 | \begin_layout Plain Layout |
---|
1672 | 2.4 |
---|
1673 | \end_layout |
---|
1674 | |
---|
1675 | \end_inset |
---|
1676 | </cell> |
---|
1677 | </row> |
---|
1678 | <row> |
---|
1679 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1680 | \begin_inset Text |
---|
1681 | |
---|
1682 | \begin_layout Plain Layout |
---|
1683 | 6 |
---|
1684 | \end_layout |
---|
1685 | |
---|
1686 | \end_inset |
---|
1687 | </cell> |
---|
1688 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1689 | \begin_inset Text |
---|
1690 | |
---|
1691 | \begin_layout Plain Layout |
---|
1692 | \begin_inset Formula $0.000428$ |
---|
1693 | \end_inset |
---|
1694 | |
---|
1695 | |
---|
1696 | \end_layout |
---|
1697 | |
---|
1698 | \end_inset |
---|
1699 | </cell> |
---|
1700 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1701 | \begin_inset Text |
---|
1702 | |
---|
1703 | \begin_layout Plain Layout |
---|
1704 | \begin_inset Formula $2.35\times10^{-5}$ |
---|
1705 | \end_inset |
---|
1706 | |
---|
1707 | |
---|
1708 | \end_layout |
---|
1709 | |
---|
1710 | \end_inset |
---|
1711 | </cell> |
---|
1712 | <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none"> |
---|
1713 | \begin_inset Text |
---|
1714 | |
---|
1715 | \begin_layout Plain Layout |
---|
1716 | 2 |
---|
1717 | \end_layout |
---|
1718 | |
---|
1719 | \end_inset |
---|
1720 | </cell> |
---|
1721 | </row> |
---|
1722 | <row> |
---|
1723 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1724 | \begin_inset Text |
---|
1725 | |
---|
1726 | \begin_layout Plain Layout |
---|
1727 | 7 |
---|
1728 | \end_layout |
---|
1729 | |
---|
1730 | \end_inset |
---|
1731 | </cell> |
---|
1732 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1733 | \begin_inset Text |
---|
1734 | |
---|
1735 | \begin_layout Plain Layout |
---|
1736 | \begin_inset Formula $0.00417$ |
---|
1737 | \end_inset |
---|
1738 | |
---|
1739 | |
---|
1740 | \end_layout |
---|
1741 | |
---|
1742 | \end_inset |
---|
1743 | </cell> |
---|
1744 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1745 | \begin_inset Text |
---|
1746 | |
---|
1747 | \begin_layout Plain Layout |
---|
1748 | \begin_inset Formula $0.000452$ |
---|
1749 | \end_inset |
---|
1750 | |
---|
1751 | |
---|
1752 | \end_layout |
---|
1753 | |
---|
1754 | \end_inset |
---|
1755 | </cell> |
---|
1756 | <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none"> |
---|
1757 | \begin_inset Text |
---|
1758 | |
---|
1759 | \begin_layout Plain Layout |
---|
1760 | 1.7 |
---|
1761 | \end_layout |
---|
1762 | |
---|
1763 | \end_inset |
---|
1764 | </cell> |
---|
1765 | </row> |
---|
1766 | <row> |
---|
1767 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1768 | \begin_inset Text |
---|
1769 | |
---|
1770 | \begin_layout Plain Layout |
---|
1771 | 8 |
---|
1772 | \end_layout |
---|
1773 | |
---|
1774 | \end_inset |
---|
1775 | </cell> |
---|
1776 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1777 | \begin_inset Text |
---|
1778 | |
---|
1779 | \begin_layout Plain Layout |
---|
1780 | \begin_inset Formula $0.0157$ |
---|
1781 | \end_inset |
---|
1782 | |
---|
1783 | |
---|
1784 | \end_layout |
---|
1785 | |
---|
1786 | \end_inset |
---|
1787 | </cell> |
---|
1788 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1789 | \begin_inset Text |
---|
1790 | |
---|
1791 | \begin_layout Plain Layout |
---|
1792 | \begin_inset Formula $0.00462$ |
---|
1793 | \end_inset |
---|
1794 | |
---|
1795 | |
---|
1796 | \end_layout |
---|
1797 | |
---|
1798 | \end_inset |
---|
1799 | </cell> |
---|
1800 | <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none"> |
---|
1801 | \begin_inset Text |
---|
1802 | |
---|
1803 | \begin_layout Plain Layout |
---|
1804 | 1.5 |
---|
1805 | \end_layout |
---|
1806 | |
---|
1807 | \end_inset |
---|
1808 | </cell> |
---|
1809 | </row> |
---|
1810 | <row> |
---|
1811 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1812 | \begin_inset Text |
---|
1813 | |
---|
1814 | \begin_layout Plain Layout |
---|
1815 | 9 |
---|
1816 | \end_layout |
---|
1817 | |
---|
1818 | \end_inset |
---|
1819 | </cell> |
---|
1820 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1821 | \begin_inset Text |
---|
1822 | |
---|
1823 | \begin_layout Plain Layout |
---|
1824 | \begin_inset Formula $0.00127$ |
---|
1825 | \end_inset |
---|
1826 | |
---|
1827 | |
---|
1828 | \end_layout |
---|
1829 | |
---|
1830 | \end_inset |
---|
1831 | </cell> |
---|
1832 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1833 | \begin_inset Text |
---|
1834 | |
---|
1835 | \begin_layout Plain Layout |
---|
1836 | \begin_inset Formula $0.0203$ |
---|
1837 | \end_inset |
---|
1838 | |
---|
1839 | |
---|
1840 | \end_layout |
---|
1841 | |
---|
1842 | \end_inset |
---|
1843 | </cell> |
---|
1844 | <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none"> |
---|
1845 | \begin_inset Text |
---|
1846 | |
---|
1847 | \begin_layout Plain Layout |
---|
1848 | 1.3 |
---|
1849 | \end_layout |
---|
1850 | |
---|
1851 | \end_inset |
---|
1852 | </cell> |
---|
1853 | </row> |
---|
1854 | <row> |
---|
1855 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1856 | \begin_inset Text |
---|
1857 | |
---|
1858 | \begin_layout Plain Layout |
---|
1859 | 10 |
---|
1860 | \end_layout |
---|
1861 | |
---|
1862 | \end_inset |
---|
1863 | </cell> |
---|
1864 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1865 | \begin_inset Text |
---|
1866 | |
---|
1867 | \begin_layout Plain Layout |
---|
1868 | \begin_inset Formula $0.0230$ |
---|
1869 | \end_inset |
---|
1870 | |
---|
1871 | |
---|
1872 | \end_layout |
---|
1873 | |
---|
1874 | \end_inset |
---|
1875 | </cell> |
---|
1876 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1877 | \begin_inset Text |
---|
1878 | |
---|
1879 | \begin_layout Plain Layout |
---|
1880 | \begin_inset Formula $0.0216$ |
---|
1881 | \end_inset |
---|
1882 | |
---|
1883 | |
---|
1884 | \end_layout |
---|
1885 | |
---|
1886 | \end_inset |
---|
1887 | </cell> |
---|
1888 | <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none"> |
---|
1889 | \begin_inset Text |
---|
1890 | |
---|
1891 | \begin_layout Plain Layout |
---|
1892 | 1.2 |
---|
1893 | \end_layout |
---|
1894 | |
---|
1895 | \end_inset |
---|
1896 | </cell> |
---|
1897 | </row> |
---|
1898 | <row> |
---|
1899 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1900 | \begin_inset Text |
---|
1901 | |
---|
1902 | \begin_layout Plain Layout |
---|
1903 | 11 |
---|
1904 | \end_layout |
---|
1905 | |
---|
1906 | \end_inset |
---|
1907 | </cell> |
---|
1908 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1909 | \begin_inset Text |
---|
1910 | |
---|
1911 | \begin_layout Plain Layout |
---|
1912 | \begin_inset Formula $0.208$ |
---|
1913 | \end_inset |
---|
1914 | |
---|
1915 | |
---|
1916 | \end_layout |
---|
1917 | |
---|
1918 | \end_inset |
---|
1919 | </cell> |
---|
1920 | <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none"> |
---|
1921 | \begin_inset Text |
---|
1922 | |
---|
1923 | \begin_layout Plain Layout |
---|
1924 | \begin_inset Formula $0.0446$ |
---|
1925 | \end_inset |
---|
1926 | |
---|
1927 | |
---|
1928 | \end_layout |
---|
1929 | |
---|
1930 | \end_inset |
---|
1931 | </cell> |
---|
1932 | <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none"> |
---|
1933 | \begin_inset Text |
---|
1934 | |
---|
1935 | \begin_layout Plain Layout |
---|
1936 | 1.1 |
---|
1937 | \end_layout |
---|
1938 | |
---|
1939 | \end_inset |
---|
1940 | </cell> |
---|
1941 | </row> |
---|
1942 | <row> |
---|
1943 | <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none"> |
---|
1944 | \begin_inset Text |
---|
1945 | |
---|
1946 | \begin_layout Plain Layout |
---|
1947 | 12 |
---|
1948 | \end_layout |
---|
1949 | |
---|
1950 | \end_inset |
---|
1951 | </cell> |
---|
1952 | <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none"> |
---|
1953 | \begin_inset Text |
---|
1954 | |
---|
1955 | \begin_layout Plain Layout |
---|
1956 | \begin_inset Formula $0.747$ |
---|
1957 | \end_inset |
---|
1958 | |
---|
1959 | |
---|
1960 | \end_layout |
---|
1961 | |
---|
1962 | \end_inset |
---|
1963 | </cell> |
---|
1964 | <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none"> |
---|
1965 | \begin_inset Text |
---|
1966 | |
---|
1967 | \begin_layout Plain Layout |
---|
1968 | \begin_inset Formula $0.253$ |
---|
1969 | \end_inset |
---|
1970 | |
---|
1971 | |
---|
1972 | \end_layout |
---|
1973 | |
---|
1974 | \end_inset |
---|
1975 | </cell> |
---|
1976 | <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none"> |
---|
1977 | \begin_inset Text |
---|
1978 | |
---|
1979 | \begin_layout Plain Layout |
---|
1980 | 1 |
---|
1981 | \end_layout |
---|
1982 | |
---|
1983 | \end_inset |
---|
1984 | </cell> |
---|
1985 | </row> |
---|
1986 | </lyxtabular> |
---|
1987 | |
---|
1988 | \end_inset |
---|
1989 | |
---|
1990 | |
---|
1991 | \end_layout |
---|
1992 | |
---|
1993 | \begin_layout Plain Layout |
---|
1994 | \begin_inset Caption |
---|
1995 | |
---|
1996 | \begin_layout Plain Layout |
---|
1997 | \align left |
---|
1998 | \begin_inset CommandInset label |
---|
1999 | LatexCommand label |
---|
2000 | name "tab:Example-PMF" |
---|
2001 | |
---|
2002 | \end_inset |
---|
2003 | |
---|
2004 | Example PMF |
---|
2005 | \end_layout |
---|
2006 | |
---|
2007 | \end_inset |
---|
2008 | |
---|
2009 | |
---|
2010 | \end_layout |
---|
2011 | |
---|
2012 | \begin_layout Plain Layout |
---|
2013 | |
---|
2014 | \end_layout |
---|
2015 | |
---|
2016 | \end_inset |
---|
2017 | |
---|
2018 | |
---|
2019 | \end_layout |
---|
2020 | |
---|
2021 | \begin_layout Standard |
---|
2022 | The table demonstrates the importance of the selection of |
---|
2023 | \begin_inset Formula $k$ |
---|
2024 | \end_inset |
---|
2025 | |
---|
2026 | , and the tradeoff against file size expansion. |
---|
2027 | Note that the survival of exactly 9 servers is significantly less likely |
---|
2028 | than the survival of 8 or 10 servers. |
---|
2029 | This is, again, an artifact of the group failure modes. |
---|
2030 | Because of this, there is no reason to choose |
---|
2031 | \begin_inset Formula $k=9$ |
---|
2032 | \end_inset |
---|
2033 | |
---|
2034 | over |
---|
2035 | \begin_inset Formula $k=10$ |
---|
2036 | \end_inset |
---|
2037 | |
---|
2038 | . |
---|
2039 | Normally, reducing the number of shares needed for reassembly improve the |
---|
2040 | file's chances of survival, but in this case it provides a minuscule gain |
---|
2041 | in reliability at the cost of a 10% increase in bandwidth and storage consumed. |
---|
2042 | \end_layout |
---|
2043 | |
---|
2044 | \begin_layout Subsection |
---|
2045 | Share Duplication |
---|
2046 | \end_layout |
---|
2047 | |
---|
2048 | \begin_layout Standard |
---|
2049 | Before moving on to consider issues other than single-interval file loss, |
---|
2050 | let's analyze one more possibility, that of |
---|
2051 | \begin_inset Quotes eld |
---|
2052 | \end_inset |
---|
2053 | |
---|
2054 | cheap |
---|
2055 | \begin_inset Quotes erd |
---|
2056 | \end_inset |
---|
2057 | |
---|
2058 | file repair via share duplication. |
---|
2059 | \end_layout |
---|
2060 | |
---|
2061 | \begin_layout Standard |
---|
2062 | Initially, files are split using erasure coding, which creates |
---|
2063 | \begin_inset Formula $N$ |
---|
2064 | \end_inset |
---|
2065 | |
---|
2066 | unique shares, any |
---|
2067 | \begin_inset Formula $k$ |
---|
2068 | \end_inset |
---|
2069 | |
---|
2070 | of which can be used to to reconstruct the file. |
---|
2071 | When shares are lost, proper repair downloads some |
---|
2072 | \begin_inset Formula $k$ |
---|
2073 | \end_inset |
---|
2074 | |
---|
2075 | shares, reconstructs the original file and then uses the erasure coding |
---|
2076 | algorithm to reconstruct the lost shares, then redeploys them to peers |
---|
2077 | in the network. |
---|
2078 | This is a somewhat expensive process. |
---|
2079 | \end_layout |
---|
2080 | |
---|
2081 | \begin_layout Standard |
---|
2082 | A cheaper repair option is simply to direct some peer that has share |
---|
2083 | \begin_inset Formula $s_{i}$ |
---|
2084 | \end_inset |
---|
2085 | |
---|
2086 | to send a copy to another peer, thus increasing by one the number of shares |
---|
2087 | in the network. |
---|
2088 | This is not as good as actually replacing the lost share, though. |
---|
2089 | Suppose that more shares were lost, leaving only |
---|
2090 | \begin_inset Formula $k$ |
---|
2091 | \end_inset |
---|
2092 | |
---|
2093 | shares remaining. |
---|
2094 | If two of those shares are identical, because one was duplicated in this |
---|
2095 | fashion, then only |
---|
2096 | \begin_inset Formula $k-1$ |
---|
2097 | \end_inset |
---|
2098 | |
---|
2099 | shares truly remain, and the file can no longer be reconstructed. |
---|
2100 | \end_layout |
---|
2101 | |
---|
2102 | \begin_layout Standard |
---|
2103 | However, such cheap repair is not completely pointless; it does increase |
---|
2104 | file survivability. |
---|
2105 | But by how much? |
---|
2106 | \end_layout |
---|
2107 | |
---|
2108 | \begin_layout Standard |
---|
2109 | Effectively, share duplication simply increases the probability that |
---|
2110 | \begin_inset Formula $s_{i}$ |
---|
2111 | \end_inset |
---|
2112 | |
---|
2113 | will survive, by providing two locations from which to retrieve it. |
---|
2114 | We can view the two copies of the single share as one, but with a higher |
---|
2115 | probability of survival than would be provided by either of the two peers. |
---|
2116 | In particular, if |
---|
2117 | \begin_inset Formula $p_{1}$ |
---|
2118 | \end_inset |
---|
2119 | |
---|
2120 | and |
---|
2121 | \begin_inset Formula $p_{2}$ |
---|
2122 | \end_inset |
---|
2123 | |
---|
2124 | are the probabilities that the two peers will survive, respectively, then |
---|
2125 | \begin_inset Formula \[ |
---|
2126 | Pr[s_{i}\, survives]=p_{1}+p_{2}-p_{1}p_{2}\] |
---|
2127 | |
---|
2128 | \end_inset |
---|
2129 | |
---|
2130 | |
---|
2131 | \end_layout |
---|
2132 | |
---|
2133 | \begin_layout Standard |
---|
2134 | More generally, if a single share is deployed on |
---|
2135 | \begin_inset Formula $n$ |
---|
2136 | \end_inset |
---|
2137 | |
---|
2138 | peers, each with a PMF |
---|
2139 | \begin_inset Formula $f_{i}(j),0\leq j\leq1,1\leq i\leq n$ |
---|
2140 | \end_inset |
---|
2141 | |
---|
2142 | , the share survival count is a random variable |
---|
2143 | \begin_inset Formula $K$ |
---|
2144 | \end_inset |
---|
2145 | |
---|
2146 | and the probability of share loss is |
---|
2147 | \begin_inset Formula \[ |
---|
2148 | Pr[K=0]=(f_{1}\star f_{2}\star\ldots\star f_{n})(0)\] |
---|
2149 | |
---|
2150 | \end_inset |
---|
2151 | |
---|
2152 | |
---|
2153 | \end_layout |
---|
2154 | |
---|
2155 | \begin_layout Standard |
---|
2156 | From that, we can construct a share PMF in the obvious way, which can then |
---|
2157 | be convolved with the other share PMFs to produce the share set PMF. |
---|
2158 | \end_layout |
---|
2159 | |
---|
2160 | \begin_layout Example |
---|
2161 | Suppose a file has |
---|
2162 | \begin_inset Formula $N=10,k=3$ |
---|
2163 | \end_inset |
---|
2164 | |
---|
2165 | and that all servers have survival probability |
---|
2166 | \begin_inset Formula $p=.9$ |
---|
2167 | \end_inset |
---|
2168 | |
---|
2169 | . |
---|
2170 | Given a full complement of shares, |
---|
2171 | \begin_inset Formula $Pr[\textrm{file\, loss}]=3.74\times10^{-7}$ |
---|
2172 | \end_inset |
---|
2173 | |
---|
2174 | . |
---|
2175 | Suppose that four shares are lost, which increases |
---|
2176 | \begin_inset Formula $Pr[\textrm{file\, loss}]$ |
---|
2177 | \end_inset |
---|
2178 | |
---|
2179 | to |
---|
2180 | \begin_inset Formula $.00127$ |
---|
2181 | \end_inset |
---|
2182 | |
---|
2183 | , a value |
---|
2184 | \begin_inset Formula $3400$ |
---|
2185 | \end_inset |
---|
2186 | |
---|
2187 | times greater. |
---|
2188 | Rather than doing a proper reconstruction, we could direct four peers still |
---|
2189 | holding shares to send a copy of their share to new peer, which changes |
---|
2190 | the composition of the shares from one of six, unique |
---|
2191 | \begin_inset Quotes eld |
---|
2192 | \end_inset |
---|
2193 | |
---|
2194 | standard |
---|
2195 | \begin_inset Quotes erd |
---|
2196 | \end_inset |
---|
2197 | |
---|
2198 | shares, to one of two standard shares, each with survival probability |
---|
2199 | \begin_inset Formula $.9$ |
---|
2200 | \end_inset |
---|
2201 | |
---|
2202 | and four |
---|
2203 | \begin_inset Quotes eld |
---|
2204 | \end_inset |
---|
2205 | |
---|
2206 | doubled |
---|
2207 | \begin_inset Quotes erd |
---|
2208 | \end_inset |
---|
2209 | |
---|
2210 | shares, each with survival probability |
---|
2211 | \begin_inset Formula $2p-p^{2}\approxeq.99$ |
---|
2212 | \end_inset |
---|
2213 | |
---|
2214 | . |
---|
2215 | \end_layout |
---|
2216 | |
---|
2217 | \begin_layout Example |
---|
2218 | Combining the two single-peer share PMFs with the four double-share PMFs |
---|
2219 | gives a new file survival probability of |
---|
2220 | \begin_inset Formula $6.64\times10^{-6}$ |
---|
2221 | \end_inset |
---|
2222 | |
---|
2223 | . |
---|
2224 | Not as good as a full repair, but still quite respectable. |
---|
2225 | Also, if storage were not a concern, all six shares could be duplicated, |
---|
2226 | for a |
---|
2227 | \begin_inset Formula $Pr[file\, loss]=1.48\times10^{-7}$ |
---|
2228 | \end_inset |
---|
2229 | |
---|
2230 | , which is actually three time better than the nominal case. |
---|
2231 | \end_layout |
---|
2232 | |
---|
2233 | \begin_layout Example |
---|
2234 | The reason such cheap repairs may be attractive in many cases is that distribute |
---|
2235 | d bandwidth is cheaper than bandwidth through a single peer. |
---|
2236 | This is particularly true if that single peer has a very slow connection, |
---|
2237 | which is common for home computers -- especially in the outbound direction. |
---|
2238 | \end_layout |
---|
2239 | |
---|
2240 | \begin_layout Section |
---|
2241 | Long-Term Reliability |
---|
2242 | \end_layout |
---|
2243 | |
---|
2244 | \begin_layout Standard |
---|
2245 | Thus far, we've focused entirely on the probability that a file survives |
---|
2246 | the interval |
---|
2247 | \begin_inset Formula $A$ |
---|
2248 | \end_inset |
---|
2249 | |
---|
2250 | between repair times. |
---|
2251 | The probability that a file survives long-term, though, is also important. |
---|
2252 | As long as the probability of failure during a repair period is non-zero, |
---|
2253 | a given file will eventually be lost. |
---|
2254 | We want to know the probability of surviving for time |
---|
2255 | \begin_inset Formula $T$ |
---|
2256 | \end_inset |
---|
2257 | |
---|
2258 | , and how the parameters |
---|
2259 | \begin_inset Formula $A$ |
---|
2260 | \end_inset |
---|
2261 | |
---|
2262 | (time between repairs) and |
---|
2263 | \begin_inset Formula $L$ |
---|
2264 | \end_inset |
---|
2265 | |
---|
2266 | (allowed share low watermark) affect survival time. |
---|
2267 | \end_layout |
---|
2268 | |
---|
2269 | \begin_layout Standard |
---|
2270 | To model file survival time, let |
---|
2271 | \begin_inset Formula $T$ |
---|
2272 | \end_inset |
---|
2273 | |
---|
2274 | be a random variable denoting the time at which a given file becomes unrecovera |
---|
2275 | ble, and |
---|
2276 | \begin_inset Formula $R(t)=Pr[T>t]$ |
---|
2277 | \end_inset |
---|
2278 | |
---|
2279 | be a function that gives the probability that the file survives to time |
---|
2280 | |
---|
2281 | \begin_inset Formula $t$ |
---|
2282 | \end_inset |
---|
2283 | |
---|
2284 | . |
---|
2285 | |
---|
2286 | \begin_inset Formula $R(t)$ |
---|
2287 | \end_inset |
---|
2288 | |
---|
2289 | is the cumulative distribution function of |
---|
2290 | \begin_inset Formula $T$ |
---|
2291 | \end_inset |
---|
2292 | |
---|
2293 | . |
---|
2294 | \end_layout |
---|
2295 | |
---|
2296 | \begin_layout Standard |
---|
2297 | Most survival functions are continuous, but |
---|
2298 | \begin_inset Formula $R(t)$ |
---|
2299 | \end_inset |
---|
2300 | |
---|
2301 | is inherently discrete and stochastic. |
---|
2302 | The time steps are the repair intervals, each of length |
---|
2303 | \begin_inset Formula $A$ |
---|
2304 | \end_inset |
---|
2305 | |
---|
2306 | , so |
---|
2307 | \begin_inset Formula $T$ |
---|
2308 | \end_inset |
---|
2309 | |
---|
2310 | -values are multiples of |
---|
2311 | \begin_inset Formula $A$ |
---|
2312 | \end_inset |
---|
2313 | |
---|
2314 | . |
---|
2315 | During each interval, the file's shares degrade according to the probability |
---|
2316 | mass function of |
---|
2317 | \begin_inset Formula $K$ |
---|
2318 | \end_inset |
---|
2319 | |
---|
2320 | . |
---|
2321 | \end_layout |
---|
2322 | |
---|
2323 | \begin_layout Subsection |
---|
2324 | Aggressive Repair |
---|
2325 | \end_layout |
---|
2326 | |
---|
2327 | \begin_layout Standard |
---|
2328 | Let's first consider the case of an aggressive repairer. |
---|
2329 | Every interval, this repairer checks the file for share losses and restores |
---|
2330 | them. |
---|
2331 | Thus, at the beginning of each interval, the file always has |
---|
2332 | \begin_inset Formula $N$ |
---|
2333 | \end_inset |
---|
2334 | |
---|
2335 | shares, distributed on servers with various individual and group failure |
---|
2336 | probabilities, which will survive or fail per the output of random variable |
---|
2337 | |
---|
2338 | \begin_inset Formula $K$ |
---|
2339 | \end_inset |
---|
2340 | |
---|
2341 | . |
---|
2342 | \end_layout |
---|
2343 | |
---|
2344 | \begin_layout Standard |
---|
2345 | For any interval, then, the probability that the file will survive is |
---|
2346 | \begin_inset Formula $f\left(k\right)=Pr[K\geq k]$ |
---|
2347 | \end_inset |
---|
2348 | |
---|
2349 | . |
---|
2350 | Since each interval success or failure is independent, and assuming the |
---|
2351 | share reliabilities remain constant over time, |
---|
2352 | \begin_inset Formula \begin{equation} |
---|
2353 | R\left(t\right)=f(k)^{t}\end{equation} |
---|
2354 | |
---|
2355 | \end_inset |
---|
2356 | |
---|
2357 | |
---|
2358 | \end_layout |
---|
2359 | |
---|
2360 | \begin_layout Standard |
---|
2361 | This simple survival function makes it simple to select parameters |
---|
2362 | \begin_inset Formula $N$ |
---|
2363 | \end_inset |
---|
2364 | |
---|
2365 | and |
---|
2366 | \begin_inset Formula $K$ |
---|
2367 | \end_inset |
---|
2368 | |
---|
2369 | such that |
---|
2370 | \begin_inset Formula $R(t)\geq r$ |
---|
2371 | \end_inset |
---|
2372 | |
---|
2373 | , where |
---|
2374 | \begin_inset Formula $r$ |
---|
2375 | \end_inset |
---|
2376 | |
---|
2377 | is a user-specified parameter indicating the desired probability of survival |
---|
2378 | to time |
---|
2379 | \begin_inset Formula $t$ |
---|
2380 | \end_inset |
---|
2381 | |
---|
2382 | . |
---|
2383 | Specifically, we can solve for |
---|
2384 | \begin_inset Formula $f\left(k\right)$ |
---|
2385 | \end_inset |
---|
2386 | |
---|
2387 | in |
---|
2388 | \begin_inset Formula $r\leq f\left(k\right)^{t}$ |
---|
2389 | \end_inset |
---|
2390 | |
---|
2391 | , giving: |
---|
2392 | \begin_inset Formula \begin{equation} |
---|
2393 | f\left(k\right)\geq\sqrt[t]{r}\end{equation} |
---|
2394 | |
---|
2395 | \end_inset |
---|
2396 | |
---|
2397 | |
---|
2398 | \end_layout |
---|
2399 | |
---|
2400 | \begin_layout Standard |
---|
2401 | So, given a PMF |
---|
2402 | \begin_inset Formula $f\left(k\right)$ |
---|
2403 | \end_inset |
---|
2404 | |
---|
2405 | , to assure the survival of a file to time |
---|
2406 | \begin_inset Formula $t$ |
---|
2407 | \end_inset |
---|
2408 | |
---|
2409 | with probability at least |
---|
2410 | \begin_inset Formula $r$ |
---|
2411 | \end_inset |
---|
2412 | |
---|
2413 | , choose |
---|
2414 | \begin_inset Formula $k$ |
---|
2415 | \end_inset |
---|
2416 | |
---|
2417 | such that |
---|
2418 | \begin_inset Formula $f\left(k\right)\geq\sqrt[t]{r}$ |
---|
2419 | \end_inset |
---|
2420 | |
---|
2421 | . |
---|
2422 | For example, if |
---|
2423 | \begin_inset Formula $A$ |
---|
2424 | \end_inset |
---|
2425 | |
---|
2426 | is one month, and |
---|
2427 | \begin_inset Formula $r=1-\nicefrac{1}{10^{6}}$ |
---|
2428 | \end_inset |
---|
2429 | |
---|
2430 | and |
---|
2431 | \begin_inset Formula $t=120$ |
---|
2432 | \end_inset |
---|
2433 | |
---|
2434 | , or 10 years, we calculate |
---|
2435 | \begin_inset Formula $f\left(k\right)\geq\sqrt[120]{.999999}\approx0.999999992$ |
---|
2436 | \end_inset |
---|
2437 | |
---|
2438 | . |
---|
2439 | Per the PMF of table |
---|
2440 | \begin_inset CommandInset ref |
---|
2441 | LatexCommand ref |
---|
2442 | reference "tab:Example-PMF" |
---|
2443 | |
---|
2444 | \end_inset |
---|
2445 | |
---|
2446 | , this means |
---|
2447 | \begin_inset Formula $k=2$ |
---|
2448 | \end_inset |
---|
2449 | |
---|
2450 | , achieves the goal, at the cost of a six-fold expansion in stored file |
---|
2451 | size. |
---|
2452 | If the lesser goal of no more than |
---|
2453 | \begin_inset Formula $\nicefrac{1}{1000}$ |
---|
2454 | \end_inset |
---|
2455 | |
---|
2456 | probability of loss is taken, then since |
---|
2457 | \begin_inset Formula $\sqrt[120]{.9999}=.999992$ |
---|
2458 | \end_inset |
---|
2459 | |
---|
2460 | , |
---|
2461 | \begin_inset Formula $k=5$ |
---|
2462 | \end_inset |
---|
2463 | |
---|
2464 | achieves the goal with an expansion factor of |
---|
2465 | \begin_inset Formula $2.4$ |
---|
2466 | \end_inset |
---|
2467 | |
---|
2468 | . |
---|
2469 | \end_layout |
---|
2470 | |
---|
2471 | \begin_layout Subsection |
---|
2472 | Repair Cost |
---|
2473 | \end_layout |
---|
2474 | |
---|
2475 | \begin_layout Standard |
---|
2476 | The simplicity and predictability of aggressive repair is attractive, but |
---|
2477 | there is a downside: Repairs cost processing power and bandwidth. |
---|
2478 | The processing power is proportional to the size of the file, since the |
---|
2479 | whole file must be reconstructed and then re-processed using the Reed-Solomon |
---|
2480 | algorithm, while the bandwidth cost is proportional to the number of missing |
---|
2481 | shares that must be replaced, |
---|
2482 | \begin_inset Formula $N-K$ |
---|
2483 | \end_inset |
---|
2484 | |
---|
2485 | . |
---|
2486 | \end_layout |
---|
2487 | |
---|
2488 | \begin_layout Standard |
---|
2489 | Let |
---|
2490 | \begin_inset Formula $c\left(s,d,k\right)$ |
---|
2491 | \end_inset |
---|
2492 | |
---|
2493 | be a cost function that combines the processing cost of regenerating a |
---|
2494 | file of size |
---|
2495 | \begin_inset Formula $s$ |
---|
2496 | \end_inset |
---|
2497 | |
---|
2498 | and the bandwidth cost of downloading a file of size |
---|
2499 | \begin_inset Formula $s$ |
---|
2500 | \end_inset |
---|
2501 | |
---|
2502 | and uploading |
---|
2503 | \begin_inset Formula $d$ |
---|
2504 | \end_inset |
---|
2505 | |
---|
2506 | shares each of size |
---|
2507 | \begin_inset Formula $\nicefrac{s}{k}$ |
---|
2508 | \end_inset |
---|
2509 | |
---|
2510 | . |
---|
2511 | Also, let |
---|
2512 | \begin_inset Formula $D$ |
---|
2513 | \end_inset |
---|
2514 | |
---|
2515 | denote the random variable |
---|
2516 | \begin_inset Formula $N-K$ |
---|
2517 | \end_inset |
---|
2518 | |
---|
2519 | , which is the number of shares that must be redistributed to bring the |
---|
2520 | file share set back up to |
---|
2521 | \begin_inset Formula $N$ |
---|
2522 | \end_inset |
---|
2523 | |
---|
2524 | after degrading during an interval. |
---|
2525 | The probability mass function of |
---|
2526 | \begin_inset Formula $D$ |
---|
2527 | \end_inset |
---|
2528 | |
---|
2529 | is |
---|
2530 | \begin_inset Formula \[ |
---|
2531 | Pr[D=d]=f(d)=\begin{cases} |
---|
2532 | Pr\left[K=N\right]+Pr[K<k] & d=0\\ |
---|
2533 | Pr\left[K=N-d\right] & 0<d\leq N-k\\ |
---|
2534 | 0 & N-k<d\leq N\end{cases}\] |
---|
2535 | |
---|
2536 | \end_inset |
---|
2537 | |
---|
2538 | |
---|
2539 | \end_layout |
---|
2540 | |
---|
2541 | \begin_layout Standard |
---|
2542 | The expected cost of repairs in a given interval, then, is simply |
---|
2543 | \begin_inset Formula $c\left(s,E\left[D\right],k\right)$ |
---|
2544 | \end_inset |
---|
2545 | |
---|
2546 | where E is the expected value function -- in this case: |
---|
2547 | \begin_inset Formula \begin{align*} |
---|
2548 | E[D] & =\sum_{d=0}^{N}d\cdot Pr\left[D=d\right]\\ |
---|
2549 | & =0\cdot Pr\left[D=0\right]+\sum_{d=1}^{N-k}\left\{ d\cdot Pr\left[K=N-d\right]\right\} +\sum_{d=N-k+1}^{N}\left\{ d\cdot0\right\} \\ |
---|
2550 | & =\sum_{d=1}^{N-k}d\cdot Pr\left[K=N-d\right]\end{align*} |
---|
2551 | |
---|
2552 | \end_inset |
---|
2553 | |
---|
2554 | |
---|
2555 | \end_layout |
---|
2556 | |
---|
2557 | \begin_layout Standard |
---|
2558 | Since each interval starts with a full complement of shares, the expected |
---|
2559 | repair cost for each interval is the same, and the cost for file that survives |
---|
2560 | for |
---|
2561 | \begin_inset Formula $t$ |
---|
2562 | \end_inset |
---|
2563 | |
---|
2564 | intervals is |
---|
2565 | \begin_inset Formula $t\cdot c\left(s,E\left[D\right]\right)$ |
---|
2566 | \end_inset |
---|
2567 | |
---|
2568 | . |
---|
2569 | To calculate the lifetime repair cost, we just take the limit over all |
---|
2570 | intervals as |
---|
2571 | \begin_inset Formula $t\rightarrow\infty$ |
---|
2572 | \end_inset |
---|
2573 | |
---|
2574 | , discounting each cost by the probability that the file has already failed. |
---|
2575 | So, the lifetime expected repair cost is |
---|
2576 | \begin_inset Formula \begin{align*} |
---|
2577 | \sum_{t=1}^{\infty}R\left(t-1\right)c\left(s,E\left[D\right],k\right) & =c\left(s,E\left[D\right],k\right)\sum_{t=1}^{\infty}R\left(t-1\right)\\ |
---|
2578 | & =c\left(s,E\left[D\right],k\right)\sum_{t=1}^{\infty}f\left(k\right)^{t-1}\\ |
---|
2579 | & =c\left(s,E\left[D\right],k\right)\cdot\frac{1}{1-f\left(k\right)}\\ |
---|
2580 | & =\frac{c\left(s,E\left[D\right],k\right)}{1-f\left(k\right)}\end{align*} |
---|
2581 | |
---|
2582 | \end_inset |
---|
2583 | |
---|
2584 | |
---|
2585 | \end_layout |
---|
2586 | |
---|
2587 | \begin_layout Standard |
---|
2588 | It is also necessary to discount future cost, since CPU and bandwidth are |
---|
2589 | both going to get cheaper over time. |
---|
2590 | To accommodate this, we throw in an addition per-period discount rate |
---|
2591 | \begin_inset Formula $r$ |
---|
2592 | \end_inset |
---|
2593 | |
---|
2594 | . |
---|
2595 | In accordance with common discount rate usage, the discount multiplier |
---|
2596 | at time |
---|
2597 | \begin_inset Formula $t$ |
---|
2598 | \end_inset |
---|
2599 | |
---|
2600 | is |
---|
2601 | \begin_inset Formula $\left(1-r\right)^{t}$ |
---|
2602 | \end_inset |
---|
2603 | |
---|
2604 | . |
---|
2605 | This gives: |
---|
2606 | \begin_inset Formula \begin{align*} |
---|
2607 | \sum_{t=1}^{\infty}\left(1-r\right){}^{t}R\left(t-1\right)c\left(s,E\left[D\right],k\right) & =c\left(s,E\left[D\right],k\right)\sum_{t=1}^{\infty}\left(1-r\right)^{t}f\left(k\right)^{t-1}\\ |
---|
2608 | & =c\left(s,E\left[D\right],k\right)\sum_{t=1}^{\infty}\left(1-r\right)^{t}f\left(k\right)^{t-1}\\ |
---|
2609 | & =c\left(s,E\left[D\right],k\right)\left(1-r\right)\sum_{t=1}^{\infty}\left(1-r\right)^{t-1}f\left(k\right)^{t-1}\\ |
---|
2610 | & =\frac{c\left(s,E\left[D\right],k\right)\left(1-r\right)}{1-\left(1-r\right)f\left(k\right)}\end{align*} |
---|
2611 | |
---|
2612 | \end_inset |
---|
2613 | |
---|
2614 | If |
---|
2615 | \begin_inset Formula $r=0$ |
---|
2616 | \end_inset |
---|
2617 | |
---|
2618 | this collapses to the previous result, as one would expect. |
---|
2619 | \end_layout |
---|
2620 | |
---|
2621 | \begin_layout Subsection |
---|
2622 | Non-aggressive Repair |
---|
2623 | \end_layout |
---|
2624 | |
---|
2625 | \begin_layout Standard |
---|
2626 | Need to write this. |
---|
2627 | \end_layout |
---|
2628 | |
---|
2629 | \begin_layout Section |
---|
2630 | Time-Sensitive Retrieval |
---|
2631 | \end_layout |
---|
2632 | |
---|
2633 | \begin_layout Standard |
---|
2634 | The above work has almost entirely ignored the distinction between availability |
---|
2635 | and reliability. |
---|
2636 | \end_layout |
---|
2637 | |
---|
2638 | \begin_layout Standard |
---|
2639 | Need to write this. |
---|
2640 | \end_layout |
---|
2641 | |
---|
2642 | \end_body |
---|
2643 | \end_document |
---|