#507 closed defect (fixed)

our mac buildslave can't build .dmg files

Reported by: warner Owned by: robk
Priority: critical Milestone: 1.3.0
Component: dev-infrastructure Version: 1.2.0
Keywords: Cc:
Launchpad Bug:

Description

We just moved our mac buildslave from one machine to another, and changed the way the buildslave is started in the process. (the new machine uses launchd from a file in /Library/LaunchDaemons?, whereas the old machine used a long-running ssh/screen session). Sometimes the buildslave can't resolve DNS names, sometimes it can't make TCP connections, and frequently it can't create the .dmg disk image file. On OS-X, many of these things require some sort of login context. I had thought the launchd setup was supposed to make that work, but for some reason it isn't.

Change History (9)

comment:1 Changed at 2008-08-29T22:32:00Z by warner

robk points out that our old workaround for this was to run the buildslave in a screen session:

login screen -s - ; ssh localhost ; buildbot start ; detach ; ignore

Most of the examples of "hdiutil create" I'm seeing on the web are done as root, but most of them were written before the "-srcfolder" option became available.

For reference, the specific command that's hanging is:

hdiutil create -verbose -ov -srcfolder Allmydata-1.2.0-r2899 allmydata-rw.dmg

and it looks like "diskimages-helper" is just about the attach the drive when everything hangs:

Initialized /dev/rdisk1s2 as a 71 MB HFS Plus volume with a 8192k journal
2008-08-29 15:14:55.586 diskimages-helper[6270:311b] -serveImage: attaching drive
{
    autodiskmount = 1;
    "hdiagent-drive-identifier" = "4B21E621-C079-424D-9172-EF9D39BA7D9A";
    "skip-auto-fsck-for-system-images" = 1;
    "system-image" = 1;
    "unmount-timeout" = 0;
}
2008-08-29 15:14:55.592 diskimages-helper[6270:311b] -serveImage: connecting to myDrive 0x00004E07
2008-08-29 15:14:55.593 diskimages-helper[6270:311b] -serveImage: register _readBuffer 0x0x689000 with myDrive 0x0x0
2008-08-29 15:14:55.593 diskimages-helper[6270:311b] -serveImage: activating drive port 0x0x4f07
2008-08-29 15:14:55.594 diskimages-helper[6270:311b] _serveImage: set cache enabled=TRUE returned FAILURE.
2008-08-29 15:14:55.649 diskimages-helper[6270:311b] _serveImage: set on IO thread=TRUE returned SUCCESS.
2008-08-29 15:14:55.650 diskimages-helper[6270:311b] -serveImage: starting server loop - myPort is 0x0x4f07
2008-08-29 15:14:55.892 diskimages-helper[6270:1603] *useEffectiveIDs**** euid/egid changed to 506,20 (uid/gid is 506,20)
2008-08-29 15:14:55.933 diskimages-helper[6270:1603] *useRealIDs******** euid/egid changed to 506,20 (uid/gid is 506,20)
2008-08-29 15:15:02.691 diskimages-helper[6270:311b] -processKernelRequest: flush received
2008-08-29 15:15:02.692 diskimages-helper[6270:311b] -processKernelRequest: flush received
2008-08-29 15:15:18.600 diskimages-helper[6270:311b] -processKernelRequest: will sleep received
2008-08-29 15:15:18.601 diskimages-helper[6270:311b] -processKernelRequest: idle received
2008-08-29 15:15:32.795 diskimages-helper[6270:311b] -processKernelRequest: will sleep received
2008-08-29 15:15:32.796 diskimages-helper[6270:311b] -processKernelRequest: flush received
2008-08-29 15:15:32.797 diskimages-helper[6270:311b] -processKernelRequest: flush received
2008-08-29 15:15:48.796 diskimages-helper[6270:311b] -processKernelRequest: will sleep received
2008-08-29 15:15:48.796 diskimages-helper[6270:311b] -processKernelRequest: idle received

The same "hdiutil create" command works fine for me from an ssh session, but not from the launchd-started buildslave.

comment:2 Changed at 2008-08-29T22:34:36Z by robk

  • Owner changed from warner to robk
  • Status changed from new to assigned

comment:3 Changed at 2008-09-02T18:11:34Z by warner

I found some new information about this stuff, specifically that there are Mach ports and namespaces that provide access to some important kernel communication features (including commands to mount disk images) that are not derived from the userid/groupid. These namespaces depend upon which context was used to start the process. Launchd does some of this, but my apple/buildbot friend told me that they're using /usr/libexec/StartupItemContext to get some stuff running.

http://buildbot.net/trac/wiki/UsingLaunchd and http://developer.apple.com/technotes/tn2005/tn2083.html has more information. I've modifed the launchd plist file on the virtualzooko machine to use StartupItemContext and am running a test now to see if that helps the .dmg step.

comment:4 Changed at 2008-09-02T18:59:13Z by warner

Drat, that didn't help. The --verbose output was exactly the same as before.

That technote references a tool that can be used to find out which Mach namespaces are available to any given process. I compiled it on my mac at home, but I need to compile it on virtualzooko and see if it shows any difference between a buildslave running under StartupItemContext and one that is not. The failure of this tests suggests to me that there isn't a difference.

I don't know which Mach thingy is required by hdiutil, nor a way to find out whether said thingy is available to a given process or not.

Next step is to escalate this to the buildbot-devel mailing list.

comment:5 Changed at 2008-09-07T18:52:18Z by warner

It just occurred to me that our launchd .plist file might be using 'buildbot start' (which makes the buildmaster detach into the background), and that it might somehow be giving up some authority in the process. I'd like to give it a try with 'buildbot run' (or probably 'twistd -noy --syslog') and see if that helps.

comment:6 Changed at 2008-09-08T06:22:13Z by warner

My friend John had a good idea: we could run the buildslave as the 'virtualzooko' user, from a start-upon-login item, rather than launching it at boot time. The start-upon-login approach would probably give it the right context, and the buildslave wouldn't interfere with our use of the virtualzooko account for ichat purposes.

comment:7 Changed at 2008-09-08T18:47:05Z by warner

Cross off one theory: the .plist file is using twistd --nodaemon, so the buildmaster isn't detaching.

I'll switch it over to the 'virtualzooko' user now, I expect that ought to make it work.

comment:8 Changed at 2008-09-08T22:24:31Z by warner

Yup, that worked. Good enough for now.

I need to modify the code to upload the .dmg files in a new way, so the builder doesn't actually work yet. But the .dmg step is working now.

comment:9 Changed at 2008-09-08T23:49:13Z by warner

  • Milestone changed from undecided to 1.3.0
  • Resolution set to fixed
  • Status changed from assigned to closed

ok, the upload is working correctly (through xfer-client.py). We didn't find a "real" solution for the .dmg problem (i.e. to allow an arbitrary user account, through a buildslave started via launchd from /Library/LaunchDaemons?, to run hdiutil), but the use-the-auto-login-user approach is good enough for our needs, and probably the needs of anyone else who has a dedicated or semi-dedicated mac for builds.

Note: See TracTickets for help on using tickets.