Changes between Version 20 and Version 21 of Performance


Ignore:
Timestamp:
2007-12-19T00:46:10Z (17 years ago)
Author:
warner
Comment:

add notes on load testing

Legend:

Unmodified
Added
Removed
Modified
  • Performance

    v20 v21  
    104104== Storage Servers ==
    105105
     106=== storage index count ===
     107
    106108ext3 (on tahoebs1) refuses to create more than 32000 subdirectories in a
    107109single parent directory. In 0.5.1, this appears as a limit on the number of
     
    125127I was unable to measure a consistent slowdown resulting from having 30000
    126128buckets in a single storage server.
     129
     130== System Load ==
     131
     132The source:src/allmydata/test/check_load.py tool can be used to generate
     133random upload/download traffic, to see how much load a Tahoe grid imposes on
     134its hosts.
     135
     136Preliminary results on the Allmydata test grid (14 storage servers spread
     137across four machines (each a 3ishGHz P4), two web servers): we used three
     138check_load.py clients running with 100ms delay between requests, an
     13980%-download/20%-upload traffic mix, and file sizes distributed exponentially
     140with a mean of 10kB. These three clients get about 8-15kBps downloaded,
     1412.5kBps uploaded, doing about one download per second and 0.25 uploads per
     142second. These traffic rates were higher at the beginning of the process (when
     143the directories were smaller and thus faster to traverse).
     144
     145The storage servers were minimally loaded. Each storage node was consuming
     146about 9% of its CPU at the start of the test, 5% at the end. These nodes were
     147receiving about 50kbps throughout, and sending 50kbps initially (increasing
     148to 150kbps as the dirnodes got larger). Memory usage was trivial, about 35MB
     149VmSize per node, 25MB RSS. The load average on a 4-node box was about 0.3 .
     150
     151The two machines serving as web servers (performing all encryption, hashing,
     152and erasure-coding) were the most heavily loaded. The clients distribute
     153their requests randomly between the two web servers. Each server was
     154averaging 60%-80% CPU usage. Memory consumption is minor, 37MB VmSize and
     15529MB RSS on one server, 45MB/33MB on the other. Load average grew from about
     1560.6 at the start of the test to about 0.8 at the end. Network traffic
     157(including both client-side plaintext and server-side shares) outbound was
     158about 600Kbps for the whole test, while the inbound traffic started at
     159200Kbps and rose to about 1Mbps at the end.
     160
     161=== initial conclusions ===
     162
     163So far, Tahoe is scaling as designed: the client nodes are the ones doing
     164most of the work, since these are the easiest to scale. In a deployment where
     165central machines are doing encoding work, CPU on these machines will be the
     166first bottleneck. Profiling can be used to determine how the upload process
     167might be optimized: we don't yet know if encryption, hashing, or encoding is
     168a primary CPU consumer. We can change the upload/download ratio to examine
     169upload and download separately.
     170
     171Deploying large networks in which clients are not doing their own encoding
     172will require sufficient CPU resources. Storage servers use minimal CPU, so
     173having all storage servers also be web/encoding servers is a natural
     174approach.