Opened at 2012-10-30T23:12:32Z
Closed at 2020-10-30T12:35:44Z
#1836 closed defect (wontfix)
use leasedb (not crawler) to figure out how many shares you have and how many bytes
Reported by: | zooko | Owned by: | markberger |
---|---|---|---|
Priority: | normal | Milestone: | 1.15.0 |
Component: | code-storage | Version: | 1.9.2 |
Keywords: | leases garbage-collection test-needed accounting | Cc: | |
Launchpad Bug: |
Description (last modified by zooko)
In current trunk, there is a "BucketCountingCrawler" whose job it is to count up how many shares are stored.
I propose that this be replaced by using the leasedb to count files (a simple SQL COUNT query!), and at the same time to extend the storage server's abilities by letting it be able to add up the aggregate sizes of things as well as their number.
This is part of an "overarching ticket" to eliminate most uses of crawler — ticket #1834.
Change History (35)
comment:1 Changed at 2012-10-30T23:12:45Z by zooko
- Description modified (diff)
comment:2 Changed at 2012-10-30T23:14:24Z by zooko
comment:3 Changed at 2012-10-31T00:09:16Z by davidsarah
- Owner set to davidsarah
- Status changed from new to assigned
+1.
comment:4 Changed at 2012-10-31T10:08:04Z by zooko
- Summary changed from stop crawling share files in order to figure out how many shares you have to use leasedb (not crawler) to figure out how many shares you have and how many bytes
comment:5 Changed at 2012-11-09T06:51:58Z by zooko
Using leasedb this way would facilitate solving #671 — bring back sizelimit (i.e. max consumed, not min free).
comment:6 Changed at 2012-11-21T00:49:35Z by zooko
- Description modified (diff)
comment:7 Changed at 2012-12-14T20:24:43Z by zooko
Using leasedb this way would facilitate solving #940.
comment:8 Changed at 2012-12-15T00:59:41Z by davidsarah
The most basic form of the 'total used space' query is
SELECT SUM(`used_space`) FROM `shares`
How much account-specific information should we add? At the moment, there are only two accounts -- anonymous and starter -- but that is already enough to introduce the complication that more than one account can hold a lease on the same share, so the query above is not equivalent to
SELECT SUM(`used_space`) FROM `shares` s JOIN `leases` l ON (s.`storage_index` = l.`storage_index` AND s.`shnum` = l.`shnum`)
since that can count space for a share more than once.
comment:9 Changed at 2012-12-15T01:09:38Z by davidsarah
This query solves the above problem, giving the total number of leased shares and the total space used by leased shares:
SELECT COUNT(*), SUM(`used_space`) FROM (SELECT `used_space` FROM `shares` s JOIN `leases` l ON (s.`storage_index` = l.`storage_index` AND s.`shnum` = l.`shnum`) GROUP BY s.`storage_index`, s.`shnum`)
(Any WHERE clause can be added to the inner SELECT to pick leases that satisfy certain criteria.)
And this gives the number of shares and total used space leased by each account, sorted beginning with the one that is using most space:
SELECT `account_id`, COUNT(*), SUM(`used_space`) FROM `leases` l LEFT JOIN `shares` s ON (l.`storage_index` = s.`storage_index` AND l.`shnum` = s.`shnum`) GROUP BY `account_id` ORDER BY SUM(`used_space`) DESC
comment:10 Changed at 2013-07-04T16:23:57Z by zooko
- Description modified (diff)
comment:11 Changed at 2013-07-26T15:04:30Z by markberger
- Keywords review-needed added
Here is a patch for this ticket: https://github.com/markberger/tahoe-lafs/tree/1836-use-leasedb-for-share-count
comment:12 Changed at 2013-07-29T13:52:24Z by daira
Reviewed, but I think this doesn't remove the BucketCrawler yet.
comment:13 Changed at 2013-07-29T13:54:21Z by daira
- Keywords test-needed added
comment:14 Changed at 2013-07-31T15:17:05Z by daira
- Keywords review-needed removed
- Owner changed from davidsarah to markberger
- Status changed from assigned to new
Removed review-needed until BucketCountingCrawlectomy is complete.
comment:15 Changed at 2013-08-02T14:48:18Z by markberger
- Keywords review-needed added; test-needed removed
All of the BucketCountingCrawler code has been removed and tests have been added to the branch.
comment:16 Changed at 2013-08-03T00:26:02Z by daira
- Milestone changed from undecided to 1.11.0
- Owner changed from markberger to daira
- Status changed from new to assigned
Reviewing.
comment:17 Changed at 2013-08-28T15:58:19Z by zooko
- Milestone changed from soon to 1.12.0
comment:18 Changed at 2014-03-26T02:09:13Z by remyroy
Diara, did you review this one past comment 16. Is this still in need of a review?
comment:19 Changed at 2014-03-27T20:33:49Z by remyroy
- Owner changed from daira to remyroy
- Status changed from assigned to new
I'll do another pass at the code review for this one.
comment:20 Changed at 2014-03-27T20:34:03Z by remyroy
- Status changed from new to assigned
comment:21 Changed at 2014-03-27T21:26:40Z by daira
I appear to have dropped the ball on this one after comment:16. Yes, it's still in need of review.
comment:22 follow-up: ↓ 24 Changed at 2014-05-05T15:40:43Z by remyroy
- Keywords test-needed added; review-needed removed
- Owner changed from remyroy to markberger
- Status changed from assigned to new
Review of https://github.com/markberger/tahoe-lafs/tree/1836-use-leasedb-for-share-count :
Good job with this change. There are a few small things that I found.
I could not run the full test suite. It might be because this branch was made on a somewhat old version of tahoe-lafs. There are a bunch of "exceptions.ImportError?: cannot import name HTTPConnectionPool" in the tests. If you could merge your branch with the latest trunk version, it might solve this.
In src/allmydata/web/storage.py, it seems like there are still a few remaining BucketCountingCrawler? stuff there are still left. For instance, in StorageStatus?.render_JSON, you are still returning bucket-counter even though it returns None for it. Is this because the UI expects it? If this is the case, the UI might need to be changed as well as the backend. Another one is StorageStatus?.render_count_crawler_status . Is this still needed for something if the crawler was removed?
Reassigning to markberger to fix those issues.
comment:23 Changed at 2014-05-05T16:50:56Z by daira
remyroy: what's the output of bin/tahoe --version-and-path for you (on that branch)?
comment:24 in reply to: ↑ 22 Changed at 2014-05-05T17:02:10Z by daira
Replying to remyroy:
I could not run the full test suite. It might be because this branch was made on a somewhat old version of tahoe-lafs. There are a bunch of "exceptions.ImportError: cannot import name HTTPConnectionPool" in the tests.
I see the problem; that branch has a requirement of Twisted >= 11.0.0, but HTTPConnectionPool was only made public in Twisted 12.1.0. The 1819-cloud-merge branch has a requirement of Twisted >= 12.1.0 for that reason.
comment:25 Changed at 2014-05-05T17:07:00Z by remyroy
I'm not sure if you still need the version-and-path but here it is:
allmydata-tahoe: 1.10.0.post171 [HEAD: 93b727857cc521963d1609a72ae4772c8f0bb1a0] (/home/remyroy/Projects/tahoe-lafs/src) foolscap: 0.6.4 (/home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/foolscap-0.6.4-py2.7.egg) pycryptopp: 0.6.0.1206569328141510525648634803928199668821045408958 (/home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/pycryptopp-0.6.0.1206569328141510525648634803928199668821045408958-py2.7-linux-x86_64.egg) zfec: 1.4.7 (/home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/zfec-1.4.7-py2.7-linux-x86_64.egg) Twisted: 11.1.0 (/home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/Twisted-11.1.0-py2.7-linux-x86_64.egg) Nevow: 0.10.0 (/home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/Nevow-0.10.0-py2.7.egg) zope.interface: unknown (/usr/lib/python2.7/dist-packages/zope) python: 2.7.6 (/usr/bin/python) platform: Linux-Ubuntu_14.04-x86_64-64bit_ELF (None) pyOpenSSL: 0.13 (/usr/lib/python2.7/dist-packages) simplejson: 3.3.1 (/usr/lib/python2.7/dist-packages) pycrypto: 2.6.1 (/usr/lib/python2.7/dist-packages) pyasn1: 0.1.7 (/home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/pyasn1-0.1.7-py2.7.egg) mock: 1.0.1 (/home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages) txAWS: None [(<type 'exceptions.ImportError'>, 'No module named txaws', ('/home/remyroy/Projects/tahoe-lafs/src/allmydata/__init__.py', 196, 'get_package_versions_and_locations', '__import__(modulename)'))] (None) oauth2client: None [(<type 'exceptions.ImportError'>, 'No module named oauth2client', ('/home/remyroy/Projects/tahoe-lafs/src/allmydata/__init__.py', 196, 'get_package_versions_and_locations', '__import__(modulename)'))] (None) python-dateutil: None [(<type 'exceptions.ImportError'>, 'No module named dateutil', ('/home/remyroy/Projects/tahoe-lafs/src/allmydata/__init__.py', 196, 'get_package_versions_and_locations', '__import__(modulename)'))] (None) httplib2: 0.8 (/usr/lib/python2.7/dist-packages) python-gflags: None [(<type 'exceptions.ImportError'>, 'No module named gflags', ('/home/remyroy/Projects/tahoe-lafs/src/allmydata/__init__.py', 196, 'get_package_versions_and_locations', '__import__(modulename)'))] (None) setuptools: 0.6c16dev4 (/home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/setuptools-0.6c16dev4.egg) Warning: dependency 'txaws' (version None imported from None) was not found by pkg_resources. Warning: dependency 'oauth2client' (version None imported from None) was not found by pkg_resources. Warning: dependency 'python-dateutil' (version None imported from None) was not found by pkg_resources. Warning: dependency 'httplib2' (version '0.8' imported from '/usr/lib/python2.7/dist-packages') was not found by pkg_resources. Warning: dependency 'python-gflags' (version None imported from None) was not found by pkg_resources. For debugging purposes, the PYTHONPATH was '/home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages' install_requires was ['setuptools >= 0.6c6', 'zfec >= 1.1.0', 'simplejson >= 1.4', 'zope.interface == 3.6.0, == 3.6.1, == 3.6.2, >= 3.6.5', 'Twisted >= 11.0.0', 'foolscap >= 0.6.3', 'pyOpenSSL', 'Nevow >= 0.6.0', 'pycrypto == 2.1.0, == 2.3, >= 2.4.1', 'pyasn1 >= 0.0.8a', 'mock >= 0.8.0', 'pycryptopp >= 0.6.0'] sys.path after importing pkg_resources was /home/remyroy/Projects/tahoe-lafs/support/bin: /home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/setuptools-0.6c16dev4.egg: /home/remyroy/Projects/tahoe-lafs/src: /home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/pycryptopp-0.6.0.1206569328141510525648634803928199668821045408958-py2.7-linux-x86_64.egg: /home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/mock-1.0.1-py2.7.egg: /home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/pyasn1-0.1.7-py2.7.egg: /home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/Nevow-0.10.0-py2.7.egg: /home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/foolscap-0.6.4-py2.7.egg: /home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/zfec-1.4.7-py2.7-linux-x86_64.egg: /home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/pyutil-1.9.7-py2.7.egg: /home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/zbase32-1.1.5-py2.7.egg: /home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/Twisted-11.1.0-py2.7-linux-x86_64.egg: /home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages: /usr/lib/python2.7: /usr/lib/python2.7/plat-x86_64-linux-gnu: /usr/lib/python2.7/lib-tk: /usr/lib/python2.7/lib-old: /usr/lib/python2.7/lib-dynload: /usr/local/lib/python2.7/dist-packages: /usr/lib/python2.7/dist-packages: /usr/lib/python2.7/dist-packages/PILcompat: /usr/lib/python2.7/dist-packages/gtk-2.0: /usr/lib/python2.7/dist-packages/ubuntu-sso-client
I was using Twisted 11.1.
comment:26 Changed at 2014-05-05T18:38:14Z by daira
Thanks, that confirms that it was the Twisted version.
I've rebased markberger's branch on top of 1819-cloud-merge: https://github.com/tahoe-lafs/tahoe-lafs/commits/1836-use-leasedb-for-share-count
comment:27 Changed at 2014-05-05T19:14:03Z by remyroy
Just ran the test suite on https://github.com/tahoe-lafs/tahoe-lafs/commits/1836-use-leasedb-for-share-count and everything seems fine.
comment:28 Changed at 2014-05-05T20:43:03Z by daira
SELECT COUNT(*), SUM(`used_space`) FROM (SELECT `used_space` FROM `shares` s JOIN `leases` l" ON (s.`storage_index` = l.`storage_index` AND s.`shnum` = l.`shnum`) GROUP BY s.`storage_index`, s.`shnum`)
My relational algebra may be a little rusty, but can't that be simplified to:
SELECT COUNT(*), SUM(`used_space`) FROM `shares` s JOIN `leases` l" ON (s.`storage_index` = l.`storage_index` AND s.`shnum` = l.`shnum`) GROUP BY s.`storage_index`, s.`shnum`
?
comment:29 Changed at 2014-05-05T20:53:53Z by daira
comment:30 Changed at 2014-05-05T20:56:19Z by daira
Oh, I was responsible for the variation with the double SELECT ... FROM ... in comment:9 . I wonder whether there was any reason for writing it that way?
comment:31 Changed at 2014-05-06T17:57:56Z by zooko
comment:32 Changed at 2016-03-22T05:02:25Z by warner
- Milestone changed from 1.12.0 to 1.13.0
Milestone renamed
comment:33 Changed at 2016-06-28T18:17:14Z by warner
- Milestone changed from 1.13.0 to 1.14.0
renaming milestone
comment:34 Changed at 2020-06-30T14:45:13Z by exarkun
- Milestone changed from 1.14.0 to 1.15.0
Moving open issues out of closed milestones.
comment:35 Changed at 2020-10-30T12:35:44Z by exarkun
- Resolution set to wontfix
- Status changed from new to closed
The established line of development on the "cloud backend" branch has been abandoned. This ticket is being closed as part of a batch-ticket cleanup for "cloud backend"-related tickets.
If this is a bug, it is probably genuinely no longer relevant. The "cloud backend" branch is too large and unwieldy to ever be merged into the main line of development (particularly now that the Python 3 porting effort is significantly underway).
If this is a feature, it may be relevant to some future efforts - if they are sufficiently similar to the "cloud backend" effort - but I am still closing it because there are no immediate plans for a new development effort in such a direction.
Tickets related to the "leasedb" are included in this set because the "leasedb" code is in the "cloud backend" branch and fairly well intertwined with the "cloud backend". If there is interest in lease implementation change at some future time then that effort will essentially have to be restarted as well.
The part about reporting total space usage would be very useful for customers of LeastAuthority.com (who pay per byte), among others.