Ungoogled-chromium: Prevent domain substitution from causing unnecessary ninja rebuilds

Created on 2 Nov 2019  路  9Comments  路  Source: Eloston/ungoogled-chromium

The domain substitution python script currently replaces the domains in the whole content.

Would you consider limiting the replacement to non-comment lines?

I am not thinking of implementing mime-specific replacements (or parsing an AST) but simply to avoid lines that match ^(?!\s*//)(.*?).

The fix could either be to modify all regular expressions, or to inject this during the pattern compilation (and shift each replacement index by 2 e.g. <1> -><3>` and so on).

Another implementation idea could be to check lines one by one, but it would be less efficient.

enhancement

Most helpful comment

I haven't done a test with ninja, yet, but the timestamping logic should be working.

All 9 comments

What is the use-case for this feature?

If you are managing multiple patches changing the comments of files will cause them to rebuild objects, further extending the build time.

In short: change only what needs to be changed.

I don't think your regex solution will work well because:

  1. You're assuming that comments are denoted by //. This is not true for all languages used in Chromium, and also excludes multi-line comments.
  2. You're only considering lines that begin with a comment (and optionally some spacing)
  3. You'll still have to re-build files that modify non-comment lines.

If your workflow is to undo substitution, modify patches, then re-substitute, the timestamps on the files with the same contents will change. Ninja relies on the file timestamps to determine what to re-build. A while ago, I was thinking about doing some timestamp manipulation:

Objectives/assumptions:

  • If the file has not been modified, we want the same timestamp during two subsequent inverse operations, i.e. applying then removing, or removing then applying. We should assume that ninja can be invoked with or without domain substitution at any point.
  • Substituted files are considered newer than the originals
  • We should not have a timestamp that goes into the future
  • EDIT: When the user modifies a file, the timestamp gets updated to a newer time. Specifically the new timestamp must be newer than the original timestamp + delta.
  • EDIT2: Only the user and domain substitution should be modifying the timestamps and the content.

Algorithm:

  1. When applying subsstitution, set the new timestamp to be the original + a consistent small delta (e.g. 1 second). This has the following benefits:

    • The delta guarantees that ninja will re-build the file if it has not built it while it was substituted.

    • Having a consistent delta means that we can get back to the original timestamp when we undo substitution.

    • This timestamp manipulation will correctly re-build the file if the user modifies the file between removing and applying substitution.

  2. When reverting substitution, restore the original timestamps back to the files by subtracting the delta away.

    • This still works even if the user modified the file while it was substituted.

  1. This is not true for all languages used in Chromium, and also excludes multi-line comments.

Aware of this. As I wrote in OP, without parsing an AST it's not possible to do it right.

Ninja relies on the file timestamps to determine what to re-build. A while ago, I was thinking about doing some timestamp manipulation:

I already developed and use a tool that helps me with that, which is based on something simpler: the git object hash: if it is not changed across commits/patches, then the original mtime is restored.

I had open sourced this tool but then rolled it back because of no interest; I can release it again if you think it might be useful for you too, although I don't fully grasp the process you outlined here.

If your workflow is to undo substitution, modify patches, then re-substitute

No actually I was thinking of the more typical workflow when updating patches: moving back-and-forth between branches/commits and rebasing, without having to rebuild everything if the new HEAD has most of the same files unchanged.

Having a big commit/stash which changes a lot of files (the domain substitution) causes havoc, thus I'd generally like to minimise changes to comment areas.

No actually I was thinking of the more typical workflow when updating patches: moving back-and-forth between branches/commits and rebasing, without having to rebuild everything if the new HEAD has most of the same files unchanged.

I am using "modify patches" in reference to the workflow we have for ungoogled-chromium: We store changes to files in patches, so any changes we make between builds must be stored in patches. This isn't necessary with git until you're ready to export the commits as patches.

Regardless, the end effect is the same: Between un-applying and re-applying domain substitution, some but not all source files get modified. It doesn't matter that the code is being modified with quilt or git.

although I don't fully grasp the process you outlined here.

I am just doing some manipulations of the timestamps so that ninja doesn't think the file is modified when the content is the same. This scheme does not require storing more metadata about the source code files; just some new lines of code that will manipulate the timestamps.

Also, I forgot to mention two important (but intuitive/obvious) assumptions. I've edited my post to include them.

I had open sourced this tool but then rolled it back because of no interest; I can release it again if you think it might be useful for you too, although I don't fully grasp the process you outlined here.

That is nifty, but neither ungoogled-chromium nor I have a use for it right now. Will let you know if we do.

We store changes to files in patches, so any changes we make between builds must be stored in patches. This isn't necessary with git until you're ready to export the commits as patches.

Yes, similar workflow here.

quilt
Thanks for mentioning quilt, had forgotten about it.

I am just doing some manipulations of the timestamps so that ninja doesn't think the file is modified when the content is the same.

I see; with the new explanation of the deltas it's much clearer, thanks. I use a much less sophisticated approach.

So if I understand correctly, you have that strategy in mind but there is currently no tooling in place so far? I assume you prefer to close this one because it's a "hack" that does not work in all cases; it would however cover quite a lot of unnecessary changes, from what I could see.

Also, on a slightly unrelated topic, I might propose not replacing the domains on the test files as I could individuate quite a lot of those.

So if I understand correctly, you have that strategy in mind but there is currently no tooling in place so far? I assume you prefer to close this one because it's a "hack" that does not work in all cases; it would however cover quite a lot of unnecessary changes, from what I could see.

I had the idea for a little while, but I didn't have any concrete reason to implement it. Now that you created this issue, I decided to work out the details. It turned out simpler to implement and more robust than I originally thought. I'll look into implementing it.

Also, on a slightly unrelated topic, I might propose not replacing the domains on the test files as I could individuate quite a lot of those.

Not quite sure what you mean by individuating them, but yeah we should discuss this in a new issue.

I haven't done a test with ninja, yet, but the timestamping logic should be working.

Alright, thanks, I am going to see if I can use this.

Was this page helpful?
0 / 5 - 0 ratings