[tahoe-dev] Rates of file duplication
Jeremy Fitzhardinge
jeremy at goop.org
Tue Sep 2 08:31:24 PDT 2008
I ended up writing a couple of perl scripts to generate file content
profiles, and compared a few of my machines. The amount of sharing is
much lower than I expected, and confirms your 1% number pretty well.
I tried it on three machines:
* lurch: 32-bit Fedora 9 server, 1588837 unique files
* ezr: 32-bit Fedora 9 laptop, 687124 unique files
* minilith: 64-bit Fedora 9 desktop, 1014310 unique files
All three are up to date, and all are have a moderately large chunks of
my user data copied on all three.
Comparing the two 32-bit F9 machines, which I would have thought would
be the most similar, I get around 42Gbytes - 16% - savings:
42019430400/263635070976 duplicate bytes, 15.9384827839636%
179728/2275961 duplicate files, 7.89679612260491%
and comparing all three there's 60 Gbytes of savings, or down to about 14%:
59280711680/420225855488 duplicate bytes, 14.1068691766142%
504620/3290271 duplicate files, 15.3367306218849%
I've put my tools and profile files up at http://www.goop.org/~jeremy/dups/
J
More information about the tahoe-dev
mailing list