#313 closed enhancement (fixed)

periodic automated test of allmydata.com prodnet grid

Reported by: zandr Owned by: zandr
Priority: major Milestone: undecided
Component: operational Version: 0.7.0
Keywords: test reliability statistics Cc:
Launchpad Bug:

Description

It would be valuable for operational monitoring if there were a make target that ran an upload/download test using the webapi.

This could then be run from a periodic builder and notify ops on failure, providing a real "does it work" system monitor.

Change History (15)

comment:1 Changed at 2008-03-22T23:28:38Z by zooko

We currently have such a buildbot-launched system test, but it currently tests the tahoe test grid, not the allmydata.com production grid.

comment:2 Changed at 2008-03-25T18:41:54Z by zooko

  • Milestone changed from 1.0.0 to 1.0.1

comment:3 Changed at 2008-03-25T18:44:39Z by zooko

  • Summary changed from Buildbot should have a system test to automated test of a grid

I was wrong in the previous comment -- our automated test doesn't use the tahoe test grid but rather constructs a local transient grid all on the same machine, tests it, then destroys it.

We want an automated, regular test of a real live external grid. We can have one instance of such a test pointing at The Tahoe Test Grid, and another pointing at the http://allmydata.com production grid which holds backed up files on behalf of allmydata.com customers. Ideally this test would be sufficiently packaged that other people could use it to test their own grids.

comment:4 Changed at 2008-05-05T21:08:36Z by zooko

  • Milestone changed from 1.0.1 to 1.1.0

Milestone 1.0.1 deleted

comment:5 Changed at 2008-05-29T23:16:41Z by warner

  • Summary changed from automated test of a grid to automated test of allmydata.com prodnet grid

comment:6 Changed at 2008-05-29T23:33:26Z by warner

  • Owner changed from somebody to zandr

Zandr is going to take responsibility for getting a few 1U machines running in the office closet. Once we have these, we'll have a host from which to run the automated prodnet test.

Zandr: please reassign to zooko once the hardware is ready.

comment:7 Changed at 2008-05-30T00:10:03Z by zandr

  • Status changed from new to assigned

Heatsinks to reactivate the old blockserver2 and blockserver4 machines have been ordered. I expect to see them Tues, so I should have hosts available on Weds.

2 machines, P4 3.0GHz, 2GB RAM, 4x250GB drives

comment:8 Changed at 2008-06-04T01:04:35Z by zooko

  • Milestone changed from 1.1.0 to 1.1.1

comment:9 Changed at 2008-06-19T22:49:02Z by warner

  • Summary changed from automated test of allmydata.com prodnet grid to periodic automated test of allmydata.com prodnet grid

The "make check-grid" target now performs this sort of test, and we have buildslaves to run it against testgrid and prodnet on each trunk checkin. Another instance runs it against prodnet on each prod-branch checkin.

The only remaining bit is to rig up something to run it periodically as well (independent of changes to the source tree). I don't want to put this in the Tahoe buildbot, since it is testing the prodnet grid, not the source code. But rigging up another buildmaster might be appropriate, at least that would give us historical results and email on failure.

comment:10 Changed at 2008-06-20T15:20:55Z by zandr

This doesn't seem inappropriate for the 'release' buildmaster, but I'm open to the idea of a 'monitoring' buildmaster as well. (or some other tool, maybe not buildbot?)

comment:11 Changed at 2008-07-02T00:45:11Z by warner

For reference: the 'make check-grid' step (described as "check-grid prodnet" and "check-grid testgrid") is done by the 'gutsy' builder in all cases, and takes place after the unit tests and before the make-tarballs step.

comment:12 Changed at 2009-03-28T19:37:35Z by zooko

  • Milestone changed from 1.3.1 to undecided

Moving this out of the 1.3.1 Milestone.

comment:13 Changed at 2009-10-28T07:37:28Z by davidsarah

  • Keywords test reliability stats added

comment:14 Changed at 2009-12-13T05:44:36Z by davidsarah

  • Keywords statistics added; stats removed

comment:15 Changed at 2010-10-22T14:39:47Z by zooko

  • Resolution set to fixed
  • Status changed from assigned to closed

I believe that we did have automated periodic monitoring of the allmydata.com prod grid, once upon a time, but that grid is now gone, so I'm closing this ticket. (Others may be interested in using the code, e.g. make check-speed and make check-grid.)

Note: See TracTickets for help on using tickets.