Zig: the first line of .d files is ambiguous due to clang's output format

Created on 11 Mar 2019  Â·  16Comments  Â·  Source: ziglang/zig

https://github.com/ziglang/zig/blob/fec4555476e38d2eeb1dfb02572404b243acd0b2/src/cache_hash.cpp#L431-L432

But here's an example file:

test.o: test.c b/foo.h

In this example there are 2 files on the first line. The dep file parsing probably needs to gain a real tokenizer, and support any number of items on any line. For example it also doesn't look for multiple files on the same line if there are double quotes.

I also haven't managed to find documentation for this file format. It would be nice to find a reference. Supposedly it's "NMake or Jom" format.

Finally, audit the places that call cache_add_dep_file and make sure they don't redundantly add the main source file, since it's contained in the dep file as well.

bug contributor friendly stage1 upstream

All 16 comments

Technically dep file must be a valid Makefile. In practice, compilers support a few different forms. Ninja build has a comprehensive parser for the format: https://github.com/ninja-build/ninja/blob/master/src/depfile_parser.in.cc

The supported constructs end up being:

  1. Simple
    out: in1 in2 in3
  1. Escaped newlines
    out: in1 \
         in2 \
         in3
  1. Separate declarations
    out: in1
    out: in2
    out: in3
  1. Ignore phony rules generated by the gcc -MP option
    out: in1 in2 in3
    in1:
    in2:
    in3:

I also haven't managed to find documentation for this file format

file clang/lib/Frontend/DependencyFile.cpp function PrintFilename() is probably the best we're going to get. what I can see right now is if someone is doing trickery with quotes inside an include file name, clang's emission doesn't handle it too well so they're really not all too formalized anyways.

Here is output from both styles, using 2 source files foo.c is simple, and odd.....name.c is including a bunch of specials. One important surprise is that the target name uses make/compatible format even if -MV is passed to clang. So I'm thinking it would probably just be easier to implement a tokenizer for make/compatible because NMake/Jom format would require a tokenizer that understood make/compatible anyways.

NMake/Jom format

clang -MV -MM -MF - -MG foo.c 'odd{ball} #$^! name.c'

foo.o: foo.c foo_01.h foo_02.h foo_03.h foo_04.h foo_05.h foo_06.h \
  foo_07.h foo_08.h foo_09.h foo_10.h
odd{ball}\ \#$$^!\ name.o: "odd{ball} #$^! name.c" "odd{01}.h" "odd$.h" "odd^.h" \
  "odd#.h" "odd!.h"

make/compatible format - notice esca

clang -MM -MF - -MG foo.c 'odd{ball} #$^! name.c'

foo.o: foo.c foo_01.h foo_02.h foo_03.h foo_04.h foo_05.h foo_06.h \
  foo_07.h foo_08.h foo_09.h foo_10.h
odd{ball}\ \#$$^!\ name.o: odd{ball}\ \#$$^!\ name.c odd{01}.h odd$$.h odd^.h \
  odd\#.h odd!.h

On Windows, the \ is used in file paths, so it can't be relied on for escaping. The make/compatible format is ambiguous on Windows.

Let's have a chat with the clang developers and make sure that this isn't a bug, both the ambiguous case for make/compatible format, and the case where the target name does not respect -MV.

Finally, audit the places that call cache_add_dep_file and make sure they don't redundantly add the main source file, since it's contained in the dep file as well.

is this to mean that in the following .d, source.c should not be passed to cache_add_file() because the build system already knows the dep?

zig-cache/tmp/h6XndQRnlXKV-source.o: source.c foo.h bar.h

Yep I think you have the correct understanding. Stated another way, it means that in this case cache_add_dep_file should be assumed to add source.c, foo.h, and bar.h, and so an additional cache_add_file for source.c would be redundant.

On Windows, the \ is used in file paths, so it can't be relied on for escaping. The make/compatible format is ambiguous on Windows.

Let's have a chat with the clang developers and make sure that this isn't a bug, both the ambiguous case for make/compatible format, and the case where the target name does not respect -MV.

hold off a bit. i've got what I think is a really good implementation that i'm fuzz-testing using default make/compatible .d format.

as it relates to make/compatible format emitted by clang, It took me a bit to get these nuggets and here is what my prwip is doing:

\ escape

  • if \ is followed by one of { <space>, # } then it is an escape so we translate:

    • \<space>→ <space>

    • \# → #

  • otherwise \ is literal and passed through

    • note: this part is difficult, I don't have zig/clang on windows platform to test, but for macos clang is consistently rewriting literal \ → / for .d generation. I don't know if this clang-rewrite happens on windows

  • nonetheless, prwip accepts literal \ unless used as an escape for <space> and #

$ escape

  • if $ is followed by exactly one $ then it is an escape and we translate:

    • $$ → $

  • it should be impossible to ever see a single $ in .d so it is an error if detected

Without -MV, windows paths don't get escaped.
bad-dep-file
This format is ambiguous.

This format is ambiguous.

So that answers a question. clang was re-writing \ → / on macOS. I guess it's rewriting is turned off on windows as per test-obj.d file.

But -MV is still ambiguous too because (at least on macOS) it does this:

foo$$.o: "foo$.c" "foo$.h"

target does not obey -MV syntax. it's prereqs do. I understand that it's less likely a target will need quotation.

Right now it sounds the like the best way to go is indeed to switch prereq tokens (RHS of colon) to -MV format for my incoming pr. And leave make/compatible tokenization for target tokens (LHS of colon). Just be aware we still have that ambiguity with target.

I think it's definitely worth filing a bug report with clang. I can help with that.

find and xargs have a need to share a file list and they use -print0 and -0 to pipe nullz filenames. Piping aside (that's not a terrible idea either but feature creep), a new -M0 or -MZ option for clang to generate nullz filenames would suit: 3 targets represented fairly easily:

target,null
prereq,null
null
target,null
null
target,null
prereq,null
prereq,null

But -MV is still ambiguous too because (at least on macOS) it does this:

foo$$.o: "foo$.c" "foo$.h"

Is that actually ambiguous? Isn't the $ escaped with another $? We need to find out the behavior on Windows.

But -MV is still ambiguous too because (at least on macOS) it does this:

foo$$.o: "foo$.c" "foo$.h"

Is that actually ambiguous? Isn't the $ escaped with another $? We need to find out the behavior on Windows.

Yes, foo$$.o is not ambiguous, my intent was only to show that simultaneously, an emitted .d file has two kinds of escaping going on.

This is probably where it could be fixed upstream:
https://github.com/llvm/llvm-project/blob/ce2d45e7ba4d86a1f7b694deefddda4b60ff6fd7/clang/lib/Frontend/DependencyFile.cpp#L463-L464

For posterity: Windows clang does the following with target pathnames; note that drive letter : is also never escaped. I'll have to get creative because it clashes with target-end delimiter.

literal pathname|.d target|number of literal spaces
---|---|:-:
C:\Users\mike\foo.o|C:\Users\mike\foo.o:
C:\Users\mike\foo .o|C:\Users\mike\foo\ .o:|1
C:\Users\mike\foo#.o|C:\Users\mike\foo\#.o:
C:\Users\mike\foo$.o|C:\Users\mike\foo$$.o:
C:\Users\mike\ foo.o|C:\Users\mike\\\ foo.o:|1
C:\Users\mike\#foo.o|C:\Users\mike\\#foo.o:
C:\Users\mike\$foo.o|C:\Users\mike\$$foo.o:
C:\Users\mike\ foo.o|C:\Users\mike\\\ \ \ \ \ foo.o:|5

Now that @mikdusan's new parser is merged, Zig looks at the first line of dep file parsing but this issue remains, because clang outputs ambiguous .d files. So this issue is open to follow up with the upstream bug report and get the issue fixed in clang.

One more item to solve, once the first line files is solved, is making sure that zig does not redundantly add the target file to the cache. This would be harmless, but a small waste of time.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

andrewrk picture andrewrk  Â·  3Comments

komuw picture komuw  Â·  3Comments

jayschwa picture jayschwa  Â·  3Comments

andersfr picture andersfr  Â·  3Comments

DavidYKay picture DavidYKay  Â·  3Comments