Opened at 2010-04-08T23:28:04Z
Closed at 2010-04-16T22:15:32Z
#1017 closed defect (fixed)
allmydata.org source repository is broken
Reported by: | davidsarah | Owned by: | somebody |
---|---|---|---|
Priority: | supercritical | Milestone: | soon (release n/a) |
Component: | dev-infrastructure | Version: | n/a |
Keywords: | trac darcs | Cc: | |
Launchpad Bug: |
Description (last modified by davidsarah)
Changes to the main trunk repository, on "hanford" (a.k.a. dev.allmydata.org), are normally mirrored to another repository on allmydata.org that is used by the darcs trac plugin to implement Browse Source. However the script that does this is not working, possibly due to disk problems on allmydata.org.
Apparently for the same or a related reason (?), none of the buildbots are able to check out the source from allmydata.org -- for example see this log and also this one (two different errors).
The script on hanford that is failing when trying to push changes to allmydata.org is /home/source/bin/mirror-to-org.sh, which is invoked by the post-commit hook, /home/darcs/tahoe/trunk-posthook.sh with argument tahoe/trunk. It fails with the message
darcs failed: Not a repository: source@allmydata.org:darcs/tahoe/trunk ((scp) failed to fetch: source@allmydata.org:darcs/tahoe/trunk/_darcs/inventory)
The error on checking out the source from allmydata.org is currently:
darcs: failed to read patch in get_extra: Sun Feb 21 12:36:26 PST 2010 freestorm77@gmail.com * munin-tahoe_storagespace Ignore-this: 14d6d6a587afe1f8883152bf2e46b4aa Plugin configuration rename Perhaps this is a 'partial' repository?
Note that in a previous build there was a different error:
Invalid repository: http://allmydata.org/source/tahoe/distribute darcs failed: Failed to download URL http://allmydata.org/source/tahoe/distribute/_darcs/inventory : HTTP error (404?)
The patch mentioned in the first checkout error above, which is also the only current difference in the hanford repository relative to allmydata.org, is the one attached to #968. I think this was pushed at approx. 23:30 UTC on April 3. It is a very minimal patch: it only changes a typo in a comment here. But we should avoid pushing other patches until this issue has been fixed.
Attachments (1)
Change History (9)
Changed at 2010-04-08T23:29:20Z by davidsarah
comment:1 Changed at 2010-04-08T23:29:44Z by davidsarah
- Description modified (diff)
comment:2 Changed at 2010-04-09T03:17:43Z by zooko
Hm, I wonder if this was a transient failure of "allmydata.org". It seems to be working okay now:
Wonwin-McBrootles-Computer:~$ ssh zooko@allmydata.org "ls -lL /home/source/darcs/tahoe/trunk" total 200 -rw-rw-r-- 1 source source 18249 May 1 2008 COPYING.GPL -rw-rw-r-- 1 source source 11258 May 1 2008 COPYING.TGPPL.html -rw-rw-r-- 1 source source 2707 Mar 3 18:05 CREDITS -rw-rw-r-- 1 source source 15070 Feb 3 10:32 Makefile -rw-rw-r-- 1 source source 51865 Feb 26 23:31 NEWS -rw-rw-r-- 1 source source 422 Mar 3 15:29 README -rw-rw-r-- 1 source source 72 May 1 2008 Tahoe.home -rw-rw-r-- 1 source source 5194 Feb 14 21:15 _auto_deps.py drwxrwsr-x 6 source source 4096 Mar 9 10:52 _darcs drwxrwsr-x 2 source source 4096 Feb 11 2009 bin drwxrwsr-x 3 source source 4096 Jun 8 2008 contrib drwxrwsr-x 7 source source 4096 Mar 3 18:05 docs -rw-rw-r-- 1 source source 7683 Feb 5 2009 ez_setup.py drwxrwsr-x 4 source source 4096 Sep 24 2009 mac drwxrwsr-x 10 source source 4096 Mar 3 15:29 misc -rw-rw-r-- 1 source source 1510 Feb 23 23:01 relnotes-short.txt -rw-rw-r-- 1 source source 6166 Feb 23 23:06 relnotes.txt -rw-rw-r-- 1 source source 2949 Jul 16 2009 setup.cfg -rw-rw-r-- 1 source source 15355 Sep 20 2009 setup.py drwxrwsr-x 3 source source 4096 May 1 2008 src drwxrwsr-x 3 source source 4096 May 1 2008 twisted drwxrwsr-x 2 source source 4096 Jan 25 20:34 windows
On the other hand, I can't check on the script on dev.allmydata.com because dev.allmydata.com is currently unreachable:
Wonwin-McBrootles-Computer:~$ ping -c 3 dev.allmydata.com PING hanford.allmydata.com (207.7.153.140): 56 data bytes --- hanford.allmydata.com ping statistics --- 3 packets transmitted, 0 packets received, 100% packet loss
I suspect that in the near future we'll move to allmydata.org -- I guess the "new" allmydata.org -- being the canonical repository and forget about dev.allmydata.com.
comment:3 Changed at 2010-04-09T03:47:45Z by davidsarah
hanford is reachable as dev.allmydata.org. I can ssh to it without problems.
You can tell that the source mirror is still not up-to-date by looking at http://allmydata.org/trac/tahoe-lafs/browser/misc/munin/tahoe_storagespace#L13 -- it still shows [tahoe-storagespace] instead of [tahoe_storagespace]. The corresponding file on hanford (/home/darcs/tahoe/trunk/misc/munin/tahoe_storagespace) has the patch applied. I don't have an account on allmydata.org, but if you do:
ssh zooko@allmydata.org "cat /home/source/darcs/tahoe/trunk/misc/munin/tahoe_storagespace"
that should confirm the problem.
I just tried running the /home/source/bin/mirror-to-org.sh script manually again on hanford, and it failed in the same way. I don't think it's a permissions problem on hanford, because that's not consistent with the error message, and in any case the script that actually does the mirroring is run via suid_exec.
We could try pushing another trivial patch, but I'm fairly sure that will also fail.
comment:4 Changed at 2010-04-12T23:05:03Z by davidsarah
- Description modified (diff)
comment:5 Changed at 2010-04-12T23:52:33Z by davidsarah
- Description modified (diff)
- Summary changed from Mirroring of source to allmydata.org trac is broken to allmydata.org source repository is broken
Checkouts by buildbots are affected as well. I'd bump up the priority of this ticket, but it is already supercritical :-)
If the problem is the disk failure on allmydata.org, then perhaps:
- mount /home, or at least /home/darcs, from a different disk.
- move aside any repos that might be corrupted and pull them again from hanford.
comment:6 Changed at 2010-04-13T00:20:59Z by davidsarah
- Milestone changed from undecided to soon (release n/a)
comment:7 Changed at 2010-04-16T22:00:27Z by davidsarah
- Description modified (diff)
comment:8 Changed at 2010-04-16T22:15:32Z by secorp
- Resolution set to fixed
- Status changed from new to closed
This problem stemmed from the /etc/resolv.conf file on dev.allmydata.com not having the proper dns server for name resolution. This caused allmydata.org not to resolve which caused the darcs push command to time out. After updating the /etc/resolv.conf file (necessary after the machines were moved and re-IPed), david-sarah verified that the pushes were working and it also looks like the buildslaves are working too.
Output of darcs push when mirroring script failed.