Bazel: `new_http_archive` can't handle archives containing unicode-encoded filenames

Created on 16 Aug 2016  Â·  12Comments  Â·  Source: bazelbuild/bazel

I'm attempting to add libgit2 to a project as a bazel external, like so:

new_http_archive(
  name = "com_github_libgit2",
  url = "https://github.com/libgit2/libgit2/archive/v0.24.1.tar.gz",
  strip_prefix = "libgit2-0.24.1",
  sha256 = "60198cbb34066b9b5c1613d15c0479f6cd25f4aef42f7ec515cd1cc13a77fede",
  build_file = "BUILD.libgit2",
)

Unfortunately, bazel build @com_github_libgit2//... fails with

Unhandled exception thrown during build; message: Unrecoverable error while evaluating node 'REPOSITORY_DIRECTORY:@com_github_libgit2' (requested by nodes 'REPOSITORY:@com_github_libgit2')
INFO: Elapsed time: 3.331s
java.lang.RuntimeException: Unrecoverable error while evaluating node 'REPOSITORY_DIRECTORY:@com_github_libgit2' (requested by nodes 'REPOSITORY:@com_github_libgit2')
        at com.google.devtools.build.skyframe.ParallelEvaluator$Evaluate.run(ParallelEvaluator.java:1070)
        at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:474)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: /home/nelhage/.cache/bazel/_bazel_nelhage/63183be0b56fd73a3d972912805a7bbe/external/com_github_libgit2/tests/resources/status/è¿™
        at sun.nio.fs.UnixPath.encode(UnixPath.java:147)
        at sun.nio.fs.UnixPath.<init>(UnixPath.java:71)
        at sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:281)
        at java.io.File.toPath(File.java:2234)
        at com.google.devtools.build.lib.bazel.repository.CompressedTarFunction.decompress(CompressedTarFunction.java:69)
        at com.google.devtools.build.lib.bazel.repository.DecompressorValue.decompress(DecompressorValue.java:76)
        at com.google.devtools.build.lib.bazel.repository.NewHttpArchiveFunction.fetch(NewHttpArchiveFunction.java:70)
        at com.google.devtools.build.lib.rules.repository.RepositoryDelegatorFunction.compute(RepositoryDelegatorFunction.java:155)
        at com.google.devtools.build.skyframe.ParallelEvaluator$Evaluate.run(ParallelEvaluator.java:1016)
        ... 4 more
java.lang.RuntimeException: Unrecoverable error while evaluating node 'REPOSITORY_DIRECTORY:@com_github_libgit2' (requested by nodes 'REPOSITORY:@com_github_libgit2')
        at com.google.devtools.build.skyframe.ParallelEvaluator$Evaluate.run(ParallelEvaluator.java:1070)
        at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:474)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: /home/nelhage/.cache/bazel/_bazel_nelhage/63183be0b56fd73a3d972912805a7bbe/external/com_github_libgit2/tests/resources/status/è¿™
        at sun.nio.fs.UnixPath.encode(UnixPath.java:147)
        at sun.nio.fs.UnixPath.<init>(UnixPath.java:71)
        at sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:281)
        at java.io.File.toPath(File.java:2234)
        at com.google.devtools.build.lib.bazel.repository.CompressedTarFunction.decompress(CompressedTarFunction.java:69)
        at com.google.devtools.build.lib.bazel.repository.DecompressorValue.decompress(DecompressorValue.java:76)
        at com.google.devtools.build.lib.bazel.repository.NewHttpArchiveFunction.fetch(NewHttpArchiveFunction.java:70)
        at com.google.devtools.build.lib.rules.repository.RepositoryDelegatorFunction.compute(RepositoryDelegatorFunction.java:155)
        at com.google.devtools.build.skyframe.ParallelEvaluator$Evaluate.run(ParallelEvaluator.java:1016)
        ... 4 more

The problem seems to be proximally caused by the fact that bazel forces itself into a latin-1 locale: https://github.com/bazelbuild/bazel/blob/936c2c2c815b64525bc6d3c6ac8f049655589370/src/main/cpp/blaze.cc#L1718-L1724 — UnixPath.encode will happily encode utf-8 paths in a utf-8 locale.

I think this is distinct from #374 in that I don't even need to _reference_ the problematic files in a rule; bazel can't even unpack the tarball, even though I don't care about the files in question.

P1 area-ExternalDeps team-XProduct bug

Most helpful comment

Is there any progress on fixing this? It's understandable that this is not an issue for Google internal but apparently a huge bumper for many community users.

Given that this is a 1-year-old P1 bug, shall we provide some workarounds for now? eg. adding an http_repository attribute to skip non-ascii-named files or to skip certain directories.

All 12 comments

I just ran into this too.

This is causing problems for me with libvips, which has a filename with russian characters in it in the release.

This is still an issue as of Bazel 0.13 with

    native.new_git_repository(
        name = "com_github_mosra_corrade",
        remote = "https://github.com/mosra/corrade.git",
        commit = "10a9abeca9938091edbf8a7fdf7cd6a944ce01c4",
        build_file = "//:third_party/com_github_mosra_corrade/BUILD.bazel.in",
    )

We are deprecating the native versions of http_archive and git_repository. Have you tried the skylark-based versions, do they have the same error?

To use the Skylark new_git_repository, just add this to your WORKSPACE:

load(
    "@bazel_tools//tools/build_defs/repo:git.bzl",
    "git_repository",
    "new_git_repository",
)

Yes I did try it with the skylark originally
On Tue 22 May 2018 at 11:25, katre notifications@github.com wrote:

We are deprecating the native versions of http_archive and git_repository.
Have you tried the skylark-based versions, do they have the same error?

To use the Skylark new_git_repository, just add this to your WORKSPACE:

load(
"@bazel_tools//tools/build_defs/repo:git.bzl",
"git_repository",
"new_git_repository",
)

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/bazelbuild/bazel/issues/1653#issuecomment-390923751,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAIY-7zwAr3VrRK1-la1E7cLDqlZ0iZaks5t09lzgaJpZM4JlCMK
.

>

twitter.com/steeve
github.com/steeve
linkd.in/smorin

This no longer crashes with the Starlark rules, but it still fails during extraction:

ERROR: Malformed input or input contains unmappable characters: /home/nelhage/.cache/bazel/_bazel_nelhage/9608ccc15fa131b938e0090f75bd00b5/external/com_github_libgit2/tests/resources/status/è¿™

I thought I could work around with patch_cmds = ["rm -rf tests"] but it appears to not even make it as far as running that command.

Also affected, using Starlark http_archive with Bazel 0.16.1, trying to clone from sphinx:

ERROR: Analysis of target '//tools/workspace/sphinx:sphinx_build' failed; build aborted: no such package '@sphinx//': Malformed input or input contains unmappable characters: .../external/sphinx/tests/roots/test-image-glob/testimäge.png

Same as above; using patch_cmds does not work.

May try to make a workaround Python script to run via repository_ctx so that it can manually scrub these symbols when extracting.

Is there any progress on fixing this? It's understandable that this is not an issue for Google internal but apparently a huge bumper for many community users.

Given that this is a 1-year-old P1 bug, shall we provide some workarounds for now? eg. adding an http_repository attribute to skip non-ascii-named files or to skip certain directories.

Let's make sure extract and download_and_extract work.

Interestingly enough, the problem is very sensitive to tiny changes in the environment; https://bazel-review.googlesource.com/c/bazel/+/93754 passes on my corp desktop, but fails on our CI machines.

Yes, see #7757 for a bit of an explanation.

Was this page helpful?
0 / 5 - 0 ratings