#485 closed task (fixed)
server incident reporting
Reported by: | warner | Owned by: | somebody |
---|---|---|---|
Priority: | major | Milestone: | 1.3.0 |
Component: | operational | Version: | 1.1.0 |
Keywords: | Cc: | ||
Launchpad Bug: |
Description (last modified by warner)
The current version of Foolscap now has code to report "incidents", which are logs of the events that led up to some high-severity event. There is also an API to subscribe to hear about these events.
We need to build a gathering mechanism for these events. The storage servers on a commercial grid should report Incidents to this gatherer, and the gatherer can then summarize and deliver them via email, or an RSS feed.
See also #484, which addresses a similar issue on the client side.
Things to be wary of: overloading the gatherer, bounding the sender's queue size, thundering herds if many servers experience problems at the same time.
The gatherer's interface should have a way to manage display of incidents: human operators should be able to say "yes, I know about that one", and not be distracted by well-known problems for which a fix is in progress. This kind of implies a table of incident disposition (new, still-troublesome, ignored), and maybe eventually a mechanism to automatically classify new incidents as being in a known category ("another 42 instances of Bug#123 were seen today").
Change History (3)
comment:1 Changed at 2008-07-02T03:46:20Z by warner
- Description modified (diff)
comment:2 Changed at 2008-08-05T19:11:08Z by warner
- Milestone changed from undecided to 1.2.1
- Resolution set to fixed
- Status changed from new to closed
The Incident Gatherer was added to foolscap-0.3.0, released yesterday, and has support for classifier functions. There is no web interface yet, nor email.. those are for a later release.
comment:3 Changed at 2008-09-03T01:17:14Z by warner
- Milestone changed from 1.3.1 to 1.3.0
Note that foolscap-0.2.8 has a bug in its Incident-handling code (it throws an exception during setLogDir if the incident-holding directory already exists), which makes it unsuitable for use. I've fixed the bug, but this ticket is blocked on the next Foolscap release, which will include the fix.
http://foolscap.lothar.com/trac/milestone/0.2.9 is the release in question.