#1836 closed defect (wontfix)

use leasedb (not crawler) to figure out how many shares you have and how many bytes

Reported by: zooko Owned by: markberger
Priority: normal Milestone: 1.15.0
Component: code-storage Version: 1.9.2
Keywords: leases garbage-collection test-needed accounting Cc:
Launchpad Bug:

Description (last modified by zooko)

In current trunk, there is a "BucketCountingCrawler" whose job it is to count up how many shares are stored.

I propose that this be replaced by using the leasedb to count files (a simple SQL COUNT query!), and at the same time to extend the storage server's abilities by letting it be able to add up the aggregate sizes of things as well as their number.

This is part of an "overarching ticket" to eliminate most uses of crawler — ticket #1834.

Change History (35)

comment:1 Changed at 2012-10-30T23:12:45Z by zooko

  • Description modified (diff)

comment:2 Changed at 2012-10-30T23:14:24Z by zooko

The part about reporting total space usage would be very useful for customers of LeastAuthority.com (who pay per byte), among others.

Last edited at 2012-10-31T10:07:37Z by zooko (previous) (diff)

comment:3 Changed at 2012-10-31T00:09:16Z by davidsarah

  • Owner set to davidsarah
  • Status changed from new to assigned

+1.

comment:4 Changed at 2012-10-31T10:08:04Z by zooko

  • Summary changed from stop crawling share files in order to figure out how many shares you have to use leasedb (not crawler) to figure out how many shares you have and how many bytes

comment:5 Changed at 2012-11-09T06:51:58Z by zooko

Using leasedb this way would facilitate solving #671 — bring back sizelimit (i.e. max consumed, not min free).

comment:6 Changed at 2012-11-21T00:49:35Z by zooko

  • Description modified (diff)

comment:7 Changed at 2012-12-14T20:24:43Z by zooko

Using leasedb this way would facilitate solving #940.

comment:8 Changed at 2012-12-15T00:59:41Z by davidsarah

The most basic form of the 'total used space' query is

SELECT SUM(`used_space`) FROM `shares`

How much account-specific information should we add? At the moment, there are only two accounts -- anonymous and starter -- but that is already enough to introduce the complication that more than one account can hold a lease on the same share, so the query above is not equivalent to

SELECT SUM(`used_space`) FROM `shares` s JOIN `leases` l
       ON (s.`storage_index` = l.`storage_index` AND s.`shnum` = l.`shnum`)

since that can count space for a share more than once.

comment:9 Changed at 2012-12-15T01:09:38Z by davidsarah

This query solves the above problem, giving the total number of leased shares and the total space used by leased shares:

SELECT COUNT(*), SUM(`used_space`)
  FROM (SELECT `used_space`
          FROM `shares` s JOIN `leases` l
          ON (s.`storage_index` = l.`storage_index` AND s.`shnum` = l.`shnum`)
          GROUP BY s.`storage_index`, s.`shnum`)

(Any WHERE clause can be added to the inner SELECT to pick leases that satisfy certain criteria.)

And this gives the number of shares and total used space leased by each account, sorted beginning with the one that is using most space:

SELECT `account_id`, COUNT(*), SUM(`used_space`)
  FROM `leases` l LEFT JOIN `shares` s
  ON (l.`storage_index` = s.`storage_index` AND l.`shnum` = s.`shnum`)
  GROUP BY `account_id` ORDER BY SUM(`used_space`) DESC

comment:10 Changed at 2013-07-04T16:23:57Z by zooko

  • Description modified (diff)

After talking with markberger today, I realized that #1818 is the ticket to merge leasedb into trunk, and #1819 is the superceding ticket to merge leasedb+cloud-backend into trunk.

comment:11 Changed at 2013-07-26T15:04:30Z by markberger

  • Keywords review-needed added

comment:12 Changed at 2013-07-29T13:52:24Z by daira

Reviewed, but I think this doesn't remove the BucketCrawler yet.

Last edited at 2013-07-29T13:54:44Z by daira (previous) (diff)

comment:13 Changed at 2013-07-29T13:54:21Z by daira

  • Keywords test-needed added

comment:14 Changed at 2013-07-31T15:17:05Z by daira

  • Keywords review-needed removed
  • Owner changed from davidsarah to markberger
  • Status changed from assigned to new

Removed review-needed until BucketCountingCrawlectomy is complete.

Last edited at 2014-03-27T20:18:14Z by zooko (previous) (diff)

comment:15 Changed at 2013-08-02T14:48:18Z by markberger

  • Keywords review-needed added; test-needed removed

All of the BucketCountingCrawler code has been removed and tests have been added to the branch.

Last edited at 2013-08-20T18:26:40Z by zooko (previous) (diff)

comment:16 Changed at 2013-08-03T00:26:02Z by daira

  • Milestone changed from undecided to 1.11.0
  • Owner changed from markberger to daira
  • Status changed from new to assigned

Reviewing.

comment:17 Changed at 2013-08-28T15:58:19Z by zooko

  • Milestone changed from soon to 1.12.0

comment:18 Changed at 2014-03-26T02:09:13Z by remyroy

Diara, did you review this one past comment 16. Is this still in need of a review?

comment:19 Changed at 2014-03-27T20:33:49Z by remyroy

  • Owner changed from daira to remyroy
  • Status changed from assigned to new

I'll do another pass at the code review for this one.

Last edited at 2014-03-27T20:38:32Z by remyroy (previous) (diff)

comment:20 Changed at 2014-03-27T20:34:03Z by remyroy

  • Status changed from new to assigned

comment:21 Changed at 2014-03-27T21:26:40Z by daira

I appear to have dropped the ball on this one after comment:16. Yes, it's still in need of review.

comment:22 follow-up: Changed at 2014-05-05T15:40:43Z by remyroy

  • Keywords test-needed added; review-needed removed
  • Owner changed from remyroy to markberger
  • Status changed from assigned to new

Review of https://github.com/markberger/tahoe-lafs/tree/1836-use-leasedb-for-share-count :

Good job with this change. There are a few small things that I found.

I could not run the full test suite. It might be because this branch was made on a somewhat old version of tahoe-lafs. There are a bunch of "exceptions.ImportError?: cannot import name HTTPConnectionPool" in the tests. If you could merge your branch with the latest trunk version, it might solve this.

In src/allmydata/web/storage.py, it seems like there are still a few remaining BucketCountingCrawler? stuff there are still left. For instance, in StorageStatus?.render_JSON, you are still returning bucket-counter even though it returns None for it. Is this because the UI expects it? If this is the case, the UI might need to be changed as well as the backend. Another one is StorageStatus?.render_count_crawler_status . Is this still needed for something if the crawler was removed?

Reassigning to markberger to fix those issues.

Version 0, edited at 2014-05-05T15:40:43Z by remyroy (next)

comment:23 Changed at 2014-05-05T16:50:56Z by daira

remyroy: what's the output of bin/tahoe --version-and-path for you (on that branch)?

Last edited at 2014-05-05T16:53:08Z by daira (previous) (diff)

comment:24 in reply to: ↑ 22 Changed at 2014-05-05T17:02:10Z by daira

Replying to remyroy:

I could not run the full test suite. It might be because this branch was made on a somewhat old version of tahoe-lafs. There are a bunch of "exceptions.ImportError: cannot import name HTTPConnectionPool" in the tests.

I see the problem; that branch has a requirement of Twisted >= 11.0.0, but HTTPConnectionPool was only made public in Twisted 12.1.0. The 1819-cloud-merge branch has a requirement of Twisted >= 12.1.0 for that reason.

comment:25 Changed at 2014-05-05T17:07:00Z by remyroy

I'm not sure if you still need the version-and-path but here it is:

allmydata-tahoe: 1.10.0.post171 [HEAD: 93b727857cc521963d1609a72ae4772c8f0bb1a0] (/home/remyroy/Projects/tahoe-lafs/src)
foolscap: 0.6.4 (/home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/foolscap-0.6.4-py2.7.egg)
pycryptopp: 0.6.0.1206569328141510525648634803928199668821045408958 (/home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/pycryptopp-0.6.0.1206569328141510525648634803928199668821045408958-py2.7-linux-x86_64.egg)
zfec: 1.4.7 (/home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/zfec-1.4.7-py2.7-linux-x86_64.egg)
Twisted: 11.1.0 (/home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/Twisted-11.1.0-py2.7-linux-x86_64.egg)
Nevow: 0.10.0 (/home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/Nevow-0.10.0-py2.7.egg)
zope.interface: unknown (/usr/lib/python2.7/dist-packages/zope)
python: 2.7.6 (/usr/bin/python)
platform: Linux-Ubuntu_14.04-x86_64-64bit_ELF (None)
pyOpenSSL: 0.13 (/usr/lib/python2.7/dist-packages)
simplejson: 3.3.1 (/usr/lib/python2.7/dist-packages)
pycrypto: 2.6.1 (/usr/lib/python2.7/dist-packages)
pyasn1: 0.1.7 (/home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/pyasn1-0.1.7-py2.7.egg)
mock: 1.0.1 (/home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages)
txAWS: None [(<type 'exceptions.ImportError'>, 'No module named txaws', ('/home/remyroy/Projects/tahoe-lafs/src/allmydata/__init__.py', 196, 'get_package_versions_and_locations', '__import__(modulename)'))] (None)
oauth2client: None [(<type 'exceptions.ImportError'>, 'No module named oauth2client', ('/home/remyroy/Projects/tahoe-lafs/src/allmydata/__init__.py', 196, 'get_package_versions_and_locations', '__import__(modulename)'))] (None)
python-dateutil: None [(<type 'exceptions.ImportError'>, 'No module named dateutil', ('/home/remyroy/Projects/tahoe-lafs/src/allmydata/__init__.py', 196, 'get_package_versions_and_locations', '__import__(modulename)'))] (None)
httplib2: 0.8 (/usr/lib/python2.7/dist-packages)
python-gflags: None [(<type 'exceptions.ImportError'>, 'No module named gflags', ('/home/remyroy/Projects/tahoe-lafs/src/allmydata/__init__.py', 196, 'get_package_versions_and_locations', '__import__(modulename)'))] (None)
setuptools: 0.6c16dev4 (/home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/setuptools-0.6c16dev4.egg)

Warning: dependency 'txaws' (version None imported from None) was not found by pkg_resources.
Warning: dependency 'oauth2client' (version None imported from None) was not found by pkg_resources.
Warning: dependency 'python-dateutil' (version None imported from None) was not found by pkg_resources.
Warning: dependency 'httplib2' (version '0.8' imported from '/usr/lib/python2.7/dist-packages') was not found by pkg_resources.
Warning: dependency 'python-gflags' (version None imported from None) was not found by pkg_resources.

For debugging purposes, the PYTHONPATH was
  '/home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages'
install_requires was
  ['setuptools >= 0.6c6', 'zfec >= 1.1.0', 'simplejson >= 1.4', 'zope.interface == 3.6.0, == 3.6.1, == 3.6.2, >= 3.6.5', 'Twisted >= 11.0.0', 'foolscap >= 0.6.3', 'pyOpenSSL', 'Nevow >= 0.6.0', 'pycrypto == 2.1.0, == 2.3, >= 2.4.1', 'pyasn1 >= 0.0.8a', 'mock >= 0.8.0', 'pycryptopp >= 0.6.0']
sys.path after importing pkg_resources was
  /home/remyroy/Projects/tahoe-lafs/support/bin:
  /home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/setuptools-0.6c16dev4.egg:
  /home/remyroy/Projects/tahoe-lafs/src:
  /home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/pycryptopp-0.6.0.1206569328141510525648634803928199668821045408958-py2.7-linux-x86_64.egg:
  /home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/mock-1.0.1-py2.7.egg:
  /home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/pyasn1-0.1.7-py2.7.egg:
  /home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/Nevow-0.10.0-py2.7.egg:
  /home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/foolscap-0.6.4-py2.7.egg:
  /home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/zfec-1.4.7-py2.7-linux-x86_64.egg:
  /home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/pyutil-1.9.7-py2.7.egg:
  /home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/zbase32-1.1.5-py2.7.egg:
  /home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages/Twisted-11.1.0-py2.7-linux-x86_64.egg:
  /home/remyroy/Projects/tahoe-lafs/support/lib/python2.7/site-packages:
  /usr/lib/python2.7:
  /usr/lib/python2.7/plat-x86_64-linux-gnu:
  /usr/lib/python2.7/lib-tk:
  /usr/lib/python2.7/lib-old:
  /usr/lib/python2.7/lib-dynload:
  /usr/local/lib/python2.7/dist-packages:
  /usr/lib/python2.7/dist-packages:
  /usr/lib/python2.7/dist-packages/PILcompat:
  /usr/lib/python2.7/dist-packages/gtk-2.0:
  /usr/lib/python2.7/dist-packages/ubuntu-sso-client

I was using Twisted 11.1.

comment:26 Changed at 2014-05-05T18:38:14Z by daira

Thanks, that confirms that it was the Twisted version.

I've rebased markberger's branch on top of 1819-cloud-merge: https://github.com/tahoe-lafs/tahoe-lafs/commits/1836-use-leasedb-for-share-count

comment:28 Changed at 2014-05-05T20:43:03Z by daira

SELECT COUNT(*), SUM(`used_space`)
  FROM (SELECT `used_space`
          FROM `shares` s JOIN `leases` l"
          ON (s.`storage_index` = l.`storage_index` AND s.`shnum` = l.`shnum`)
          GROUP BY s.`storage_index`, s.`shnum`)

My relational algebra may be a little rusty, but can't that be simplified to:

SELECT COUNT(*), SUM(`used_space`)
  FROM `shares` s JOIN `leases` l"
  ON (s.`storage_index` = l.`storage_index` AND s.`shnum` = l.`shnum`)
  GROUP BY s.`storage_index`, s.`shnum`

?

comment:30 Changed at 2014-05-05T20:56:19Z by daira

Oh, I was responsible for the variation with the double SELECT ... FROM ... in comment:9 . I wonder whether there was any reason for writing it that way?

Last edited at 2014-05-05T20:56:44Z by daira (previous) (diff)

comment:32 Changed at 2016-03-22T05:02:25Z by warner

  • Milestone changed from 1.12.0 to 1.13.0

Milestone renamed

comment:33 Changed at 2016-06-28T18:17:14Z by warner

  • Milestone changed from 1.13.0 to 1.14.0

renaming milestone

comment:34 Changed at 2020-06-30T14:45:13Z by exarkun

  • Milestone changed from 1.14.0 to 1.15.0

Moving open issues out of closed milestones.

comment:35 Changed at 2020-10-30T12:35:44Z by exarkun

  • Resolution set to wontfix
  • Status changed from new to closed

The established line of development on the "cloud backend" branch has been abandoned. This ticket is being closed as part of a batch-ticket cleanup for "cloud backend"-related tickets.

If this is a bug, it is probably genuinely no longer relevant. The "cloud backend" branch is too large and unwieldy to ever be merged into the main line of development (particularly now that the Python 3 porting effort is significantly underway).

If this is a feature, it may be relevant to some future efforts - if they are sufficiently similar to the "cloud backend" effort - but I am still closing it because there are no immediate plans for a new development effort in such a direction.

Tickets related to the "leasedb" are included in this set because the "leasedb" code is in the "cloud backend" branch and fairly well intertwined with the "cloud backend". If there is interest in lease implementation change at some future time then that effort will essentially have to be restarted as well.

Note: See TracTickets for help on using tickets.