Ticket #1392: brianfollowingstats.txt

File brianfollowingstats.txt, 8.9 KB (added by arch_o_median, at 2011-05-17T18:00:43Z)

A warner-compiled list of consumers in irc copy-paste format

Line 
1<warner> use the 30 hits from find src contrib misc -type f |grep -v pyc
2         |xargs grep '\bget_stats\b'| grep -v 'def get_stats'
3<warner> let's walk through them
4
5<warner> 6 in stats.py  [16:29]
6<warner> (I use \b so we'll match things like callRemote("get_stats") and
7         x=blah.get_stats; x() , not that anyone does the latter)
8<arc`> It looks like all the references without dot prefixes are either
9       comments or definitions or string literals containing get_stats  [16:30]
10<warner> yeah. the string literal form is what invokes
11         RIStatsProvider.get_stats  [16:31]
12<warner> it's in a callRemote() argument
13<arc`> I'm not seeing those I'm seeing log.msg("foofooget_stats() -> foo")
14
15<warner> now, 5 hits are in StatsProvider, which is the internal
16         aggregate-all-the-producers object that provides RIStatsProvider
17                                                                        [16:32]
18<warner> oh, my \b probably skipped those
19<warner> the \b matches word boundaries
20<warner> so blahXYZ won't match \bXYZ
21
22<warner> another hit is in stats.StatsGatherer, which is the service that does
23         callRemote('get_stats') and retrieves stats through the
24         RIStatsProvider.get_stats interface
25<warner> that's the receiving end of the remote protocol
26<warner> er, the initiating end, but it's where the stats wind up
27<warner> from line 211, you can see that the stats get sent to self.got_stats
28                                                                        [16:34]
29<warner> which, from the NotImplementedError on line 227, you can tell is
30         meant to be overridden by subclasses, so we need to look for 'def
31         got_stats' in subclasses. Fortunately, all the subclasses are in this
32         same file, right after StatsGatherer, so we don't have to look far
33                                                                        [16:35]
34<warner> StdOutStatsGatherer.got_stats just uses pprint() on them
35<warner> and PickleStatsGatherer.got_stats pickles them  [16:36]
36<warner> it turns out that misc/operations_helpers/munin/tahoe_stats reads
37         that pickle
38<warner> line 460 is where it extracts an individual statistic  [16:38]
39<warner> and on line 461 it does 'if value is not None', so it can probably
40         tolerate None
41<warner> also, the PLUGINS table that makes up most of the file doesn't look
42         at latencies at all  [16:39]
43<warner> ok, so that covers stats.py
44<warner> next up: web/check_results.py
45ERC>
46<warner> another hit is in stats.StatsGatherer, which is the service that does
47         callRemote('get_stats') and retrieves stats through the
48         RIStatsProvider.get_stats interface
49<warner> that's the receiving end of the remote protocol
50<warner> er, the initiating end, but it's where the stats wind up
51<warner> from line 211, you can see that the stats get sent to self.got_stats
52                                                                        [16:34]
53<warner> which, from the NotImplementedError on line 227, you can tell is
54         meant to be overridden by subclasses, so we need to look for 'def
55         got_stats' in subclasses. Fortunately, all the subclasses are in this
56         same file, right after StatsGatherer, so we don't have to look far
57                                                                        [16:35]
58<warner> StdOutStatsGatherer.got_stats just uses pprint() on them
59<warner> and PickleStatsGatherer.got_stats pickles them  [16:36]
60<warner> it turns out that misc/operations_helpers/munin/tahoe_stats reads
61         that pickle
62<warner> line 460 is where it extracts an individual statistic  [16:38]
63<warner> and on line 461 it does 'if value is not None', so it can probably
64         tolerate None
65<warner> also, the PLUGINS table that makes up most of the file doesn't look
66         at latencies at all  [16:39]
67<warner> ok, so that covers stats.py
68<warner> next up: web/check_results.py
69<warner> that calls the get_stats() that's really ICheckResults.get_stats
70                                                                        [16:40]
71* arc` yanking contents of channel for more leisurely review  [16:41]
72<warner> root.py: lines 199 and 212. Both look at a few specific keys and
73         ignore the rest, and don't look at latencies
74<warner> next up: web/status.py
75<warner> four hits  [16:42]
76<warner> there's a self.helper.get_stats() in
77         HelperStatus.data_helper_stats.. the helper doesn't have a storage
78         server, so latencies shouldn't be there
79<warner> same for the second hit in HelperStatus.render_JSON
80<warner> third hit is Statistics.renderHTTP (the t=="json" clause): that looks
81         like an external HTTP interface for fetching the data. JSON can
82         handle None, but let's add "find external HTTP callers" to the list
83         of things to investigate  [16:43]
84<warner> fourth hit is Statistics.data_get_stats, which is used as input to
85         the series of 'render_*' functions in the rest of that
86         class. Fortunately, none of them look at latency, except render_raw()
87         which just pprints() the whole thing, for which None is fine
88                                                                        [16:44]
89<warner> ok, that's all of web/status.py
90ERC>
91                                                                        [16:40]
92* arc` yanking contents of channel for more leisurely review  [16:41]
93<warner> root.py: lines 199 and 212. Both look at a few specific keys and
94         ignore the rest, and don't look at latencies
95<warner> next up: web/status.py
96<warner> four hits  [16:42]
97<warner> there's a self.helper.get_stats() in
98         HelperStatus.data_helper_stats.. the helper doesn't have a storage
99         server, so latencies shouldn't be there
100<warner> same for the second hit in HelperStatus.render_JSON
101<warner> third hit is Statistics.renderHTTP (the t=="json" clause): that looks
102         like an external HTTP interface for fetching the data. JSON can
103         handle None, but let's add "find external HTTP callers" to the list
104         of things to investigate  [16:43]
105<warner> fourth hit is Statistics.data_get_stats, which is used as input to
106         the series of 'render_*' functions in the rest of that
107         class. Fortunately, none of them look at latency, except render_raw()
108         which just pprints() the whole thing, for which None is fine
109                                                                        [16:44]
110<warner> ok, that's all of web/status.py
111<warner> last one is web/storage.py  [16:45]
112<warner> three hits
113<warner> first is StorageStatus.render_JSON: the output of
114         self.storage.get_stats() is included in the JSON response
115<warner> so that's another "find external HTTP callers" item
116<warner> second hit is a command  [16:46]
117<warner> last hit is in StorageStatus.data_stats, which (looking at the
118         associated storage_status.xhtml template for n:data="stats"
119         attributes) is used by a couple of local render_* functions
120                                                                        [16:47]
121<warner> but, none of those touch latency
122*** kevan_ (~kevan@host-134-71-248-56.allocated.csupomona.edu) has quit: Quit:
123    leaving
124ERC> OK, that was a bit faster than I could keep up with, .
125* arc` yanking contents of channel for more leisurely review  [16:41]
126<warner> root.py: lines 199 and 212. Both look at a few specific keys and
127         ignore the rest, and don't look at latencies
128<warner> next up: web/status.py
129<warner> four hits  [16:42]
130<warner> there's a self.helper.get_stats() in
131         HelperStatus.data_helper_stats.. the helper doesn't have a storage
132         server, so latencies shouldn't be there
133<warner> same for the second hit in HelperStatus.render_JSON
134<warner> third hit is Statistics.renderHTTP (the t=="json" clause): that looks
135         like an external HTTP interface for fetching the data. JSON can
136         handle None, but let's add "find external HTTP callers" to the list
137         of things to investigate  [16:43]
138<warner> fourth hit is Statistics.data_get_stats, which is used as input to
139         the series of 'render_*' functions in the rest of that
140         class. Fortunately, none of them look at latency, except render_raw()
141         which just pprints() the whole thing, for which None is fine
142                                                                        [16:44]
143<warner> ok, that's all of web/status.py
144<warner> last one is web/storage.py  [16:45]
145<warner> three hits
146<warner> first is StorageStatus.render_JSON: the output of
147         self.storage.get_stats() is included in the JSON response
148<warner> so that's another "find external HTTP callers" item
149<warner> second hit is a command  [16:46]
150<warner> last hit is in StorageStatus.data_stats, which (looking at the
151         associated storage_status.xhtml template for n:data="stats"
152         attributes) is used by a couple of local render_* functions
153                                                                        [16:47]
154<warner> but, none of those touch latency
155*** kevan_ (~kevan@host-134-71-248-56.allocated.csupomona.edu) has quit: Quit:
156    leaving
157<warner> so, we're down to HTTP callers of /storage?t=json
158         (StorageStatus.render_JSON) and of /statistics?t=json  [16:49]
159<warner> where we might be concerned that those callers will mishandle a None
160         (really a null, since it's JSON)  [16:50]
161<warner> at that point, I think we can stop, because I'm pretty sure there's
162         no code in tahoe that calls HTTP and parses JSON for stats.. all
163         tahoe's stats-consuming code is either internal, or uses foolscap
164                                                                        [16:51]
165<zooko> I'm on Skype with Josh.
166<zooko> He had a emacs UI error and didn't realize you were still typing. :-)
167<warner> so, in summary: if the tests pass, then yes, changing the storage
168         server latency-reporting code to return None for percentile buckets
169         that are not yet full should be ok  [16:52]
170<arc`> :-)  I was in the buffer I was copying into...
171* arc` blushind
172* arc` blushing
173<warner> want me to pastebin anything?
174<arc`> I think I have it all, going to review now.  Thanks!
175ERC>
176