#4098 closed task (fixed)

CircleCI checkout is broken

Reported by: meejah Owned by: btlogy
Priority: normal Milestone: undecided
Component: dev-infrastructure Version: n/a
Keywords: ci Cc:
Launchpad Bug:

Description

It looks like all CircleCI builds are broken as of a couple weeks ago due to failure to check out the code (it's using ssh so perhaps a key expired or was revoked?)

I don't see anything obvious in the config after a quick look

Attachments (2)

CircleCI_Sigin_Public.png (39.6 KB) - added by btlogy at 2024-11-13T10:11:23Z.
CircleCI Sign-in Public
CircleCI_Public-Perms.png (18.5 KB) - added by btlogy at 2024-11-13T10:18:01Z.

Download all attachments as: .zip

Change History (24)

comment:2 Changed at 2024-08-20T17:57:36Z by hacklschorsch

An easy way forward might be to configure CircleCI to use a https instead of ssh URL for the Tahoe-LAFS GitHub? repository.

Seems to me some ClickOps? in CircleCI is required for that.

comment:3 Changed at 2024-08-20T18:13:39Z by hacklschorsch

Maybe the error is spurious? This run, 7 days ago, did not show the 'ssh' error:

https://app.circleci.com/pipelines/github/tahoe-lafs/tahoe-lafs/4950/workflows/906cd99e-e8f5-4144-9270-480ed04d1fb7

comment:4 Changed at 2024-08-20T21:28:15Z by sajith

This appears to be the problem:

Warning: checkout key has zero length
Writing SSH key for checkout to "/tmp/nobody/.ssh/id_rsa"
Fetching into existing repository
Fetching from remote repository
Warning: Permanently added the ECDSA host key for IP address '140.82.113.3' to the list of known hosts.
Load key "/tmp/nobody/.ssh/id_rsa": invalid format
git@github.com: Permission denied (publickey).
fatal: Could not read from remote repository.

I am able to run a job with ssh. I did that, and it indeed is true that /tmp/nobody/.ssh/id_rsa (which is the checkout key) is empty. Why?

According to https://circleci.com/docs/configuration-reference/#checkout, "the checkout command automatically adds the required authenticity keys for interacting with GitHub? and Bitbucket over SSH". Also, under CircleCI -> Project settings -> SSH keys, it says that "no checkout key is currently configured! We won't be able to check out your project for testing."

So that seems to be the problem. There's a button that says "add deploy key" but that does not work (Firefox 129.0, Debian). I am going to try the next button, which is under "user key" that says "authorize with GitHub?" (description: "A user key is a user-specific SSH key. GitHub? has the public key, and we store the private key. Possession of the private key gives the ability to act as that user, for purposes of 'git' access to projects.")

(Can CircleCI check out code over HTTPS? That should be helpful?)

comment:5 Changed at 2024-08-20T21:38:15Z by sajith

Update: "Authorize with GitHub?" did not add a checkout key.

comment:6 Changed at 2024-08-20T22:08:27Z by sajith

The docs on both CircleCI and GitHub are confusing and my head hurts from trying to figure them out, but my current reading is that we'll need to install a "CircleCI GitHub App" in the tahoe-lafs GitHub organization.

https://circleci.com/docs/github-apps-integration/

It seems that tahoe-lafs was originally configured with an old-fashioned "GitHub OAuth app". That is what the URL https://app.circleci.com/pipelines/github/tahoe-lafs/<blah-blah-blah> suggests. If we were using the "CircleCI GitHub App", that would have been https://app.circleci.com/tahoe-lafs/circleci/<blah-blah-blah>. With that, CircleCI should be able to generate checkout keys, and presumably share the public part of those keys with GitHub.

I could complain about the useless error message from CircleCI, but that would produce nothing useful...

I have not been keeping up, but it is possible that either GitHub or CircleCI stopped working with the old-fashioned OAuth app. I do not have admin permissions on tahoe-lafs GitHub project, so I can't install the "CircleCI GitHub App" there.

Last edited at 2024-08-20T22:10:22Z by sajith (previous) (diff)

comment:7 follow-up: Changed at 2024-08-21T13:04:48Z by hacklschorsch

I didn't find anything regarding HTTPS for checkout, but I stumbled over https://batsov.com/articles/2022/09/20/resetting-circleci-checkout-ssh-keys/ saying:

There are two simple ways to regenerate the missing key:

  1. Go to CircleCI’s console and disable building the project in question (basically press “Stop building”). Once you re-enable the project the SSH key will be regenerated and added to GitHub?.
  1. Alternatively you can go to your project settings in CircleCI’s console, delete the read checkout SSH key there and re-add it. That’s under “Project settings -> SSH keys” and should be the first key you see there.

I think that option 2) is the better one, simply because it takes a bit less time, but both get the job done.

Version 0, edited at 2024-08-21T13:04:48Z by hacklschorsch (next)

comment:8 Changed at 2024-08-21T13:11:36Z by hacklschorsch

https://circleci.com/changelog/changes-to-code-checkout-for-orgs-that-integrate-with-github-app/ sounds like since Mar 25, 2024, using CircleCI’s GitHub App integration results in cloning with HTTPS by default (because it supposedly is "a more secure method" (?)).

comment:9 in reply to: ↑ 7 Changed at 2024-08-22T01:24:54Z by sajith

Replying to hacklschorsch:

I didn't find anything regarding HTTPS for checkout, but I stumbled over https://batsov.com/articles/2022/09/20/resetting-circleci-checkout-ssh-keys/ saying:

The error message Bozhidar Batsov describes does not have the "checkout key has zero length" message, so perhaps they were solving a different thing?

In any case, in app.circleci.com, under tahoe-lafs project settings -> ssh keys, there are three things:

  • Deploy key (repo-specific SSH key, for GitHub access)
  • User key (user-specific SSH key, for GitHub access)
  • Additional ssh keys

Under "Additional SSH Keys", there's a fingerprint of "SHA256:v1Kg79rxSj0qyLEb27BeBQqPQ8B2qxTwy2K2kqOeJE4" (configured to use with the hostname github.com), which I guess is that of an ssh key someone (exarkun maybe?) added. That seems to be the only thing I am able to delete.

But "Additional SSH Keys" are about "keys to the build VMs that you need to deploy to your machines", not checkouts. I don't know what one would deploy to github.com from CircleCI. Will I break something (I don't know what that something would be) if I delete that key?

comment:10 Changed at 2024-08-22T09:04:59Z by hacklschorsch

I don't know either, sorry. Hearing from Bozhidar Batsov that deleting and recreating the same key solved anything doesn't give me the best feeling about CircleCI in the first place, but I thought it might be worth a try.

So am I understanding correctly - if someone with access to GitHub Tahoe-LAFS would create a read-only deploy key you could add it to CircleCI?

I don't have the required permissions on either service.

Last edited at 2024-08-22T09:07:07Z by hacklschorsch (previous) (diff)

comment:11 Changed at 2024-08-22T13:02:26Z by sajith

I can add an "Additional SSH Key" to CircleCI. If we do that, I think we will also need to add an add_ssh_key step before each checkout step in our .circleci/config.yml. And then the someone who gives me the key will have to add the key to GitHub repository settings also (at https://github.com/tahoe-lafs/tahoe-lafs/settings/keys/new presumably.)

https://circleci.com/docs/add-ssh-key/#add-ssh-keys-to-a-job

We have ten checkout steps, so this is making me uncomfortable. :-)

It sounds like it might be much easier for someone with GitHub org admin right to install that darned "CircleCI GitHub App" before we go through the more involved steps?

I am going to try asking in CircleCI forums.

comment:12 Changed at 2024-08-22T13:20:38Z by sajith

Many forum posts came up when searching for git@github.com: Permission denied (publickey) in https://discuss.circleci.com/. One of them (https://discuss.circleci.com/t/git-github-com-permission-denied-publickey-on-repo-that-previously-worked/37763) lead to:

https://support.circleci.com/hc/en-us/articles/360021666393-How-to-stop-building-by-manually-removing-the-CircleCI-webhook-and-deploy-key-from-your-GitHub-repository

It seems that resetting the whole thing might do something:

  • Clicking the "Stop building" button on CircleCI (I can't do this, since I do not have admin access to CircleCI tahoe-lafs project)
  • Removing CircleCI webhooks from GitHub (I can't do this)
  • Removing deploy keys from GitHub (I can't do this)
  • And then add the project back to CircleCI (I don't think I can do this)

Basically a GitHub org/repo admin will need to step in.

Also worth mentioning that I see both these error messages:

Warning: checkout key has zero length
Writing SSH key for checkout to "/tmp/nobody/.ssh/id_rsa"
Fetching into existing repository
Fetching from remote repository
Warning: Permanently added the ECDSA host key for IP address '140.82.112.3' to the list of known hosts.
Load key "/tmp/nobody/.ssh/id_rsa": invalid format
git@github.com: Permission denied (publickey).
fatal: Could not read from remote repository.

and

Writing SSH key for checkout to "/tmp/nobody/.ssh/id_rsa"
Writing SSH public key for checkout to "/tmp/nobody/.ssh/id_rsa.pub"
Fetching into existing repository
Fetching from remote repository
git@github.com: Permission denied (publickey).
fatal: Could not read from remote repository.

comment:14 Changed at 2024-08-22T16:55:58Z by sajith

Also submitted a support request. I hope either the post or the request gets a response, and I hope I won't be scolded for the multi-prong approach. ;-)

-------- Forwarded Message --------
From: CircleCI 
To: Sajith
Subject: [Request Received] "fatal: Could not read from remote repository."
Date: 08/22/2024 11:49:11 AM

   
##- Please type your reply above this line -##
Hello,
Thank you for contacting CircleCI Support. Your ticket reference ID is:
153696
 
For fastest time to resolution, please visit the CircleCI Support Center,
where all of our most commonly asked questions have been answered by our
skilled Support Engineers. You can also find helpful posts created by
CircleCI users on our Discuss Forum.
 
Our team of Support Engineers answer every ticket, with priority to
customers on dedicated support plans. If your team would benefit from
guaranteed response times in 1 business day or less, consider upgrading your
Support Plan from your current level of Free to one of our paid plans. A
great place to start would be our Starter Support plan. You can easily
upgrade directly from your plan settings page in the CircleCI app.
While we are not able to guarantee response times for Free plans, someone
from our team will review this ticket and respond promptly. 
 
Thank you for reaching out, and happy building,
The CircleCI Team
This email is a service from CircleCI. 

(I'm omitting actual email addresses.)

comment:15 Changed at 2024-08-22T18:33:00Z by sajith

I received a response from CircleCI support. Basically a GitHub? org admin will need to step in.

-------- Forwarded Message --------
From: Martin (CircleCI)
Subject: [CircleCI] Re: "fatal: Could not read from remote repository."
Date: 08/22/2024 12:13:14 PM

##- Please type your reply above this line -##
Your request (153696) has been updated. To add additional comments, reply to this email.
Martin (CircleCI)
Aug 22, 2024, 12:13 CDT

Hello ,
 
Thank you for reaching out to CircleCI Support. Based on the
information provided, it seems like there might be an issue with the
SSH keys or permissions since this behavior is happening on the
checkout step.
 
One possible reason could be that the checkout SSH key got
removed. This can happen when changes are made to the developer
accounts or organization settings on GitHub. Resetting webhooks,
checkout keys, and deploy keys associated with the project can help
ensure you're working with a known good setup going forward. You can
follow the steps in this Support Center article
<https://support.circleci.com/hc/en-us/articles/360021666393-How-to-manually-stop-building-by-removing-your-deploy-key-and-webhook-from-GitHub>
to do so.
 
Another possibility is that there might be expired tokens in the
environment variables. CircleCI handles the auth for you when it
creates the key and adds it to GitHub, so most of the time,
git-related auth in the environment variables aren't necessary. If
there are expired tokens, it could prevent CircleCI from checking out
the code.
 
Lastly, have any project admin please ensure that CircleCI is enabled
in their GitHub organization's third-party app restrictions in their
Organization Settings
<https://support.circleci.com/hc/en-us/articles/360056873811-Your-access-to-a-project-from-CircleCI-was-revoked-by-GitHub>. You
can read more about these restrictions in GitHub's Documentation
<https://help.github.com/articles/about-oauth-app-access-restrictions/>.
 
Please pass this information on to the project admin on GitHub so they
can check these settings. If the issue persists, please let us know.
 
Thank you,
Martin
Support Engineer @ CircleCI

In the meanwhile check-outs have resumed working. I don't know if this happened because of something CircleCI did, or some random buttons I clicked on CircleCI, or someone else acting quietly.

comment:16 Changed at 2024-08-26T16:10:14Z by meejah

I do have org-admin on "tahoe-lafs" however the github account that has that permission is "meejah" and I am not authorizing it for use with CircleCI because they ask for way too many permissions (write access to all public, private and org repositories).

I also have a "meejahcircleci" GitHub? account, but it does not have org-admin. I tried following the either of the steps above, but they all start with "log in to your CircleCI account" so I do not believe they will succeed.

comment:17 Changed at 2024-11-12T14:40:58Z by btlogy

I've recently spent some time on this issue because it has hit us at PrivateStorage? in other related repositories.

E.g.: https://github.com/PrivateStorageio/ZKAPAuthorizer/issues/462

The problem describe in this Track ticket seems very similar and seems to be still present in the last merge commit (15 checks failed : all CircleCI):

https://github.com/tahoe-lafs/tahoe-lafs/commit/6cf67471f1ccb00bf72cd6574fdd1deb9259df9e

While the most of those checks have all passed for the related PR:

https://github.com/tahoe-lafs/tahoe-lafs/pull/1383

Our findings in short:

  • CircleCI does not checkout the code the same way for a in-repo PR and a PR from a fork, which explains why most of the CircleCI checks pass at least the checkout step in a PR while all of them fails to checkout once the PR is merged in the master branch.
  • If the org. on CircleCI has been created using GitHub? OAuth, one need to be GitHub/Tahoe?-LAFS admin/owner to be a CircleCI/Tahoe-LAFS admin for the project/org.
  • There is an alternative way to create an org. on CircleCI using mostly email and password, but it involved a lot of manual steps and does not cover (easily) all the usual workflows (e.g.: PR from fork)
  • CircleCI should checkout the code of a project using HTTPS, unless there is a private SSH key available in the CircleCI settings.
  • CircleCI propose 2 different ways to setup an SSH key for checkout:
    1. a CircleCI/Tahoe-LAFS admin user manually add an authorized private key (preferably a deploy key unique to the project/repo)
    2. a CircleCI/Tahoe-LAFS admin gives (way too many) permissions to CircleCI/OAuth to automatically create and authorize a new user key.
  • However, we've found a few projects where there is currently no SSH key, maybe automatically removed by someone leaving the project (unlikely IMHO), and regardless, CircleCI tries and fails to checkout via SSH (Load key "/tmp/nobody/.ssh/id_rsa": error in libcrypto).
  • As we are suspecting for other project, adding a new SSH key and removing it directly after seems to cleanup the dirt in the pipe and forces CircleCI to using HTTPS to checkout (WiP).
  • Alternatively, it is "only" possible to avoid SSH and force HTTPS by implementing a custom checkout step as done once here in ZKAPAuthorizer: https://github.com/PrivateStorageio/ZKAPAuthorizer/blob/999c7c05f6131dfedcef360234fc4556e76ba755/.circleci/config.yml#L27-L45)
Last edited at 2024-11-12T16:49:48Z by btlogy (previous) (diff)

comment:18 follow-up: Changed at 2024-11-13T00:09:50Z by meejah

It sounds like it might be much easier for someone with GitHub? org admin right to install that darned "CircleCI GitHub? App" before we go through the more involved steps?

As I've said many times, I am not going to give CircleCI write access to everything I've got on public and private GitHub? repositories, so this seems to be a non-starter unfortunately.

Changed at 2024-11-13T10:11:23Z by btlogy

CircleCI Sign-in Public

Changed at 2024-11-13T10:18:01Z by btlogy

comment:19 in reply to: ↑ 18 Changed at 2024-11-13T10:43:12Z by btlogy

Replying to meejah:

It sounds like it might be much easier for someone with GitHub? org admin right to install that darned "CircleCI GitHub? App" before we go through the more involved steps?

As I've said many times, I am not going to give CircleCI write access to everything I've got on public and private GitHub? repositories, so this seems to be a non-starter unfortunately.

I agree with you meejah, CircleCI is asking way too much by default. Though, we've found an a way to allow access only to public repositories. Only a small improvement and it's a shame they do not propose this one by default!

CircleCI Sign-in Public

In any case, the requested permissions may still be too much for you:

So, the alternative could be to:

  1. create a dedicated GitHub? user like circleci-tahoe (similar to the existing tahoe-robots) or re-use the meejahcircleci,
  2. give it, at least temporarily, the admin role of the tahoe-lafs project (not the whole Tahoe-LAFS org.),
  3. use it to sign-in on CircleCI by granting the required access via OAuth (only one public repository anyway),
  4. remove the existing ssh key from the project settings or add and remove a dummy one,
  5. demote this user from admin role to simple member or just delete it if no longer needed

The last resort option could be to force CircleCI to checkout the code via HTTPS instead of SSH regardless of the presence of an SSH key in the project settings:

Implement custom checkout to avoid CircleCI using any SSH key #1384 WiP

Last edited at 2024-11-13T10:44:25Z by btlogy (previous) (diff)

comment:20 Changed at 2024-12-04T10:12:21Z by btlogy

As discussed in the last N&B: it seems like meejah has shaken CircleCI enough with his meejahcircleci user the fix the "ghost key" issue.

The solution was perceived as simple as removing the existing key. However, as we have verified, CircleCI has automatically re-created an new read-only deploy key!

This process can be explicitly done via the project settings/key or silently triggered if an admin un-follow, then follow the project on the CircleCI web app.

Bottom line:

  • the Tahoe-LAFS project has now a new deployed key which is used to checkout the code,
  • unless we customize the checkout, CircleCI requires and SSH and will try to automatically provision one (when not stuck in a partial config),
  • this automatic provisioning is likely the main reason why CircleCI requires so much power via Github OAuth,
  • avoiding this is tricky (re-configure an orgs w/o Github Oauth) and require a lot of work (e.g.: manual webhook).

While it might give us some thoughts about the future of our CI, I think we can close this issue.

comment:21 Changed at 2024-12-04T10:28:10Z by btlogy

  • Component changed from unknown to dev-infrastructure
  • Keywords ci added
  • Owner set to btlogy
  • Status changed from new to assigned
  • Summary changed from CircleCI is Broken to CircleCI checkout is broken

comment:22 Changed at 2024-12-06T08:29:26Z by meejah

  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.