Squirrel.windows: Spaces in filename cause ReleaseEntry Regex to fail

Created on 11 Apr 2018 · 23Comments · Source: Squirrel/Squirrel.Windows

I'm in a situation where Squirrel updating fails when iterating over one particular release entry

E6651B768DB82076E7113086558548B4DA4F8F6F Artifact - BoxHeader - e7ee795c-99e1-4ecb-a72c-9a41461d8198.xml.shasum 8669

using this Regex:

^([0-9a-fA-F]{40})\s+(\S+)\s+(\d+)[\r]*$

The culprit appears to be that (\S+) will only match non-whitespace characters. I've played a bit around with the Regex Tester at http://regexstorm.net/tester, and my suggestion would be to change the Regex to something similar to

^([0-9a-fA-F]{40})\s+(.*).shasum[\s+](\d+)[\r]*$

We don't really want to be forced to change the filenames or escape the spaces, so I'm interested in a solution where we can avoid that. Thanks in advance.

Source

frederikolsen88

👍3

Most helpful comment

Having the same problem. Any time Squirrel decides to use a .diff (or .bsdiff) to represent the difference between old and new versions of a file, it generates a corresponding .shasum file along with the .diff. Here's an example of the shasum contents for one of our packages:

09E36653DEFCAFEDA55115AD81AC84E1B8D4A1FF Install Bloom Literacy Fonts.exe.shasum 12992

When applying the patch in DeltaPackage.applyDiffToFile, Squirrel calls VerifyPatchedFile(), which reads the .shasum file and passes the contents to ReleaseEntry.ParseReleaseEntry(), which fails to parse it because it can't match the spaces in the file name.

Squirrel then concludes that the patch is faulty, aborts the incremental update, and downloads the whole package, defeating the incremental update feature entirely.

I agree that Frederik's fix (or even just changing (\S+) to (.+) would fix the problem.

JohnThomson on 23 Oct 2018

👍2

All 23 comments

@paulcbetts, would you accept a PR incorporating the suggestion above, or would there be caveats to that approach?

frederikolsen88 on 18 May 2018

I'm confused, why is this file in your RELEASES file?

anaisbetts on 18 May 2018

Sorry, I realize now I was a bit unclear.

E6651B768DB82076E7113086558548B4DA4F8F6F Artifact - BoxHeader - e7ee795c-99e1-4ecb-a72c-9a41461d8198.xml.shasum 8669

are the contents of the .shasum corresponding with the file in question in one of our delta packages. It is when trying to update using a delta package that our users get an "Invalid release entry" exception. If there are changes to apply to a file with spaces in the filename, that is.

So while the RELEASES file isn't in play, I guess the error message is worded like that is because you're using the same Regex to parse both .shasum files and the RELEASES file.

frederikolsen88 on 18 May 2018

09E36653DEFCAFEDA55115AD81AC84E1B8D4A1FF Install Bloom Literacy Fonts.exe.shasum 12992

Squirrel then concludes that the patch is faulty, aborts the incremental update, and downloads the whole package, defeating the incremental update feature entirely.

I agree that Frederik's fix (or even just changing (\S+) to (.+) would fix the problem.

JohnThomson on 23 Oct 2018

👍2

I suppose to be sure there is some filename present, and that if there are multiple spaces at the end none of them are put in the filename group, (\S.*shasum) would be better.

JohnThomson on 23 Oct 2018

No, that won't work for the existing usages with .nupkg. We can however expect at least two non-space characters, since all usages involve either .shasum or .nupkg. (\S.*\S) should work, then. (At least, it passes all the existing ReleaseEntry unit tests.)

JohnThomson on 23 Oct 2018

Wouldn't (\w+)\s(.+)\s(\d+) be much simpler to match the hash at the beginning and the length at the end?

cbenard on 23 Oct 2018

(a) Paul's regex requires exactly 40 of the expected characters in the sha, a better validation.
(b) ^ and $ are needed to ensure it matches the whole file content for validation
(c) If there is more than one white-space character surrounding the file name, your version will include the extra white space in the filename, which is probably not good.
(d) Your version will match something with nothing but white space between the sha and the length, as long as there are at least three spaces
(e) A minimal change to fix the problem is usually wise if one is not fully familiar with a piece of code and all its purposes.

JohnThomson on 23 Oct 2018

I assumed we were talking about matching per-line. You're right about the 3 spaces thing. I didn't even see the original regex which is a bit more complicated than it needs to be.

It does seem like a pretty easy fix, especially if you know each line begins with 40 upper-case hex characters for the hash and ends with a set of numbers for length and has a filename in the middle delineated with spaces before/after. I'm surprised it's still open. I was subscribed to it for some reason, but I can't even remember if I had the issue.

cbenard on 23 Oct 2018

👍1

How about adding \S inside the filename group to ensure there's at least one non-space character and then allow multiple spaces before/after like this? Also, I made the filename match non-greedy so it won't match extra spaces between the filename and the hash/length.

^([0-9a-fA-F]{40})\s+(\S.+?)\s+(\d+)\r*$

I did a bunch of tests in The Regex Coach and it seems to work well, and it's very similar to the original.

cbenard on 23 Oct 2018

It will probably work for much the same reason as mine: filenames that will really occur here will have at least two characters. However, in the case of a single-character file name followed by two spaces, it will include one of the spaces in the filename. Using * after the dot would be even better.
Both your version and one with * instead also pass all the existing unit tests. But I think the * variation, which will allow a single character file name, is the best suggestion yet.

JohnThomson on 23 Oct 2018

\S.? means one non-space and zero or one single character. You probably meant .*? or something, like a non-greedy zero-to-any length match.

This one allows single character filenames, at the expense of requiring a single space before/after the filename: ^([0-9a-fA-F]{40})\s(\S[\S ]*?)\s(\d+)\r*$. It also stays closer to the original's definition of what a filename is (\S) plus spaces.

I don't have a dev env setup for this project, so I'm just trying to help with regex. :) I'll probably let this be my last comment on the matter.

cbenard on 23 Oct 2018

No, I tried to write openparen-backslash-S-dot-star-questionmark-closeparen and markdown interpreted it as a request for italics. So I edited the comment, but unfortunately you saw the original. Sorry.

JohnThomson on 23 Oct 2018

If you surround your "code" with backticks, you don't have to deal with the escapes. It'll also do a mono-space font with a gray background instead of italics like you're using.

Like this `some \code here` makes it look like: some \code here

cbenard on 23 Oct 2018

OK, so what I intended to suggest was (\S.*?), or in full ^([0-9a-fA-F]{40})\s+(\S.*?)\s+(\d+)\r*$
Thanks.

JohnThomson on 23 Oct 2018

👍1

Your idea of allowing only spaces, not other whitespace, is also a good one. Could still use \s+ before and after, counting on the ? to prevent a trailing space being included in the filename. So ^([0-9a-fA-F]{40})\s+(\S.[\S ]*?)\s+(\d+)\r*$ would result...which also passes unit tests.

JohnThomson on 23 Oct 2018

Yours has an error that makes it not work for single character filenames. You have an errant period inside the filename match after (\S. That requires at least 2 characters for a filename and the 2nd character can be anything, including whitespace like tabs/vertical tab/etc.

cbenard on 23 Oct 2018

Bother. A careless mistake. The version I actually unit-tested is ^([0-9a-fA-F]{40})\s+(\S[\S ]+?)\s+(\d+)[\r]*$

JohnThomson on 23 Oct 2018

That one doesn't match single character filenames either. Your [\S ]+ in the middle should be *, not +. If you use +, it requires at least one character, in addition to the \S which is another character to begin.

cbenard on 23 Oct 2018

Must have been really sleepy yesterday. ^([0-9a-fA-F]{40})\s+(\S[\S ]*?)\s+(\d+)[\r]*$.

JohnThomson on 24 Oct 2018

👍1

That one looks good. Consider it co-signed.

^([0-9a-fA-F]{40})\s+(\S[\S ]*?)\s+(\d+)[\r]*$

Now if we can only get @paulcbetts to look at this issue still open from April... :)

cbenard on 24 Oct 2018

👍1

Got the PR ready for you @paulcbetts after @JohnThomson and I worked through the regex. I hope you're able to accept the PR.

cbenard on 24 Oct 2018

👍1