Cache: Builds failling with: failed: Cache service responded with 503

Created on 15 Aug 2020  路  3Comments  路  Source: actions/cache

Today a lot of runs of Mac OS are failing with:

Post job cleanup.
/usr/bin/tar --use-compress-program zstd -T0 -cf cache.tzst -P -C /Users/runner/work/anki/anki --files-from manifest.txt
[warning]uploadChunk (start: 134217728, end: 167772159) failed: Cache service responded with 503
/Users/runner/work/_actions/actions/cache/v2/dist/save/index.js:3093
                        throw new Error(`Cache upload failed because file read failed with ${error.message}`);
                        ^

Error: Cache upload failed because file read failed with EBADF: bad file descriptor, read
    at ReadStream.<anonymous> (/Users/runner/work/_actions/actions/cache/v2/dist/save/index.js:3093:31)
    at ReadStream.emit (events.js:210:5)
    at internal/fs/streams.js:167:12
    at FSReqCallback.wrapper [as oncomplete] (fs.js:470:5)

image

I was using actions/cache@v1, but even after updating to actions/cache@v2 the problem persists: https://github.com/evandroforks/anki/runs/986660052?check_suite_focus=true#step:61:1

When the service is offline or whatever, it should not fail my builds (just because it should not save a cache value).

Related to: https://github.com/actions/cache/issues/259 - [warning]Cache service responded with 503 during chunk upload.

Most helpful comment

Confirmed. Our macOS builds are often failing the "Post Cache" step with this error:

1m 37s
}
Post job cleanup.
/usr/bin/tar -cz -f /Users/runner/work/_temp/d9d25ad0-2d4a-4f19-8c28-fe3842959184/cache.tgz -C /Users/runner/.ccache .
[warning]Cache service responded with 503 during chunk upload.
events.js:187
      throw er; // Unhandled 'error' event
      ^

Error: EBADF: bad file descriptor, read
Emitted 'error' event on ReadStream instance at:
    at internal/fs/streams.js:167:12
    at FSReqCallback.wrapper [as oncomplete] (fs.js:470:5) {
  errno: -9,
  code: 'EBADF',
  syscall: 'read'
}

For example: https://github.com/azerothcore/azerothcore-wotlk/runs/1091807160

image

All 3 comments

@evandrocoan Looks like there was a spike in errors coming from one of the edge nodes (which route traffic to the servers):

image

@aiqiaoy Can you please take a look at why the error isn't caught by the try-catch blocks? From what I can tell, the following sequence of events is happening:

  1. Some error happens during the call to uploadChunk, in this case it was a 503 response
  2. That error causes the finally block to execute and close the file descriptor
  3. Closing the file descriptor kills the other in-progress calls to uploadChunk, resulting in the Error: Cache upload failed becaues the file read failed with EBADF: bad file descriptor, read

Confirmed. Our macOS builds are often failing the "Post Cache" step with this error:

1m 37s
}
Post job cleanup.
/usr/bin/tar -cz -f /Users/runner/work/_temp/d9d25ad0-2d4a-4f19-8c28-fe3842959184/cache.tgz -C /Users/runner/.ccache .
[warning]Cache service responded with 503 during chunk upload.
events.js:187
      throw er; // Unhandled 'error' event
      ^

Error: EBADF: bad file descriptor, read
Emitted 'error' event on ReadStream instance at:
    at internal/fs/streams.js:167:12
    at FSReqCallback.wrapper [as oncomplete] (fs.js:470:5) {
  errno: -9,
  code: 'EBADF',
  syscall: 'read'
}

For example: https://github.com/azerothcore/azerothcore-wotlk/runs/1091807160

image

We have another report of an outage leading to this error behavior. To consolidate the issues, I'm filing one to track the issue with the cache action failing due to these file descriptor errors - https://github.com/actions/cache/issues/441 - and closing this one.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dcecile picture dcecile  路  6Comments

FacetGraph picture FacetGraph  路  3Comments

KhaledSakr picture KhaledSakr  路  3Comments

Cerberus picture Cerberus  路  5Comments

sergeyzwezdin picture sergeyzwezdin  路  5Comments