Squirrel.windows: Spaces in filename cause ReleaseEntry Regex to fail

Created on 11 Apr 2018  路  23Comments  路  Source: Squirrel/Squirrel.Windows

I'm in a situation where Squirrel updating fails when iterating over one particular release entry

E6651B768DB82076E7113086558548B4DA4F8F6F Artifact - BoxHeader - e7ee795c-99e1-4ecb-a72c-9a41461d8198.xml.shasum 8669

using this Regex:

^([0-9a-fA-F]{40})\s+(\S+)\s+(\d+)[\r]*$

The culprit appears to be that (\S+) will only match non-whitespace characters. I've played a bit around with the Regex Tester at http://regexstorm.net/tester, and my suggestion would be to change the Regex to something similar to

^([0-9a-fA-F]{40})\s+(.*).shasum[\s+](\d+)[\r]*$

We don't really want to be forced to change the filenames or escape the spaces, so I'm interested in a solution where we can avoid that. Thanks in advance.

Most helpful comment

Having the same problem. Any time Squirrel decides to use a .diff (or .bsdiff) to represent the difference between old and new versions of a file, it generates a corresponding .shasum file along with the .diff. Here's an example of the shasum contents for one of our packages:

09E36653DEFCAFEDA55115AD81AC84E1B8D4A1FF Install Bloom Literacy Fonts.exe.shasum 12992

When applying the patch in DeltaPackage.applyDiffToFile, Squirrel calls VerifyPatchedFile(), which reads the .shasum file and passes the contents to ReleaseEntry.ParseReleaseEntry(), which fails to parse it because it can't match the spaces in the file name.

Squirrel then concludes that the patch is faulty, aborts the incremental update, and downloads the whole package, defeating the incremental update feature entirely.

I agree that Frederik's fix (or even just changing (\S+) to (.+) would fix the problem.

All 23 comments

@paulcbetts, would you accept a PR incorporating the suggestion above, or would there be caveats to that approach?

I'm confused, why is this file in your RELEASES file?

Sorry, I realize now I was a bit unclear.

E6651B768DB82076E7113086558548B4DA4F8F6F Artifact - BoxHeader - e7ee795c-99e1-4ecb-a72c-9a41461d8198.xml.shasum 8669

are the contents of the .shasum corresponding with the file in question in one of our delta packages. It is when trying to update using a delta package that our users get an "Invalid release entry" exception. If there are changes to apply to a file with spaces in the filename, that is.

So while the RELEASES file isn't in play, I guess the error message is worded like that is because you're using the same Regex to parse both .shasum files and the RELEASES file.

Having the same problem. Any time Squirrel decides to use a .diff (or .bsdiff) to represent the difference between old and new versions of a file, it generates a corresponding .shasum file along with the .diff. Here's an example of the shasum contents for one of our packages:

09E36653DEFCAFEDA55115AD81AC84E1B8D4A1FF Install Bloom Literacy Fonts.exe.shasum 12992

When applying the patch in DeltaPackage.applyDiffToFile, Squirrel calls VerifyPatchedFile(), which reads the .shasum file and passes the contents to ReleaseEntry.ParseReleaseEntry(), which fails to parse it because it can't match the spaces in the file name.

Squirrel then concludes that the patch is faulty, aborts the incremental update, and downloads the whole package, defeating the incremental update feature entirely.

I agree that Frederik's fix (or even just changing (\S+) to (.+) would fix the problem.

I suppose to be sure there is some filename present, and that if there are multiple spaces at the end none of them are put in the filename group, (\S.*shasum) would be better.

No, that won't work for the existing usages with .nupkg. We can however expect at least two non-space characters, since all usages involve either .shasum or .nupkg. (\S.*\S) should work, then. (At least, it passes all the existing ReleaseEntry unit tests.)

Wouldn't (\w+)\s(.+)\s(\d+) be much simpler to match the hash at the beginning and the length at the end?

(a) Paul's regex requires exactly 40 of the expected characters in the sha, a better validation.
(b) ^ and $ are needed to ensure it matches the whole file content for validation
(c) If there is more than one white-space character surrounding the file name, your version will include the extra white space in the filename, which is probably not good.
(d) Your version will match something with nothing but white space between the sha and the length, as long as there are at least three spaces
(e) A minimal change to fix the problem is usually wise if one is not fully familiar with a piece of code and all its purposes.

I assumed we were talking about matching per-line. You're right about the 3 spaces thing. I didn't even see the original regex which is a bit more complicated than it needs to be.

It does seem like a pretty easy fix, especially if you know each line begins with 40 upper-case hex characters for the hash and ends with a set of numbers for length and has a filename in the middle delineated with spaces before/after. I'm surprised it's still open. I was subscribed to it for some reason, but I can't even remember if I had the issue.

How about adding \S inside the filename group to ensure there's at least one non-space character and then allow multiple spaces before/after like this? Also, I made the filename match non-greedy so it won't match extra spaces between the filename and the hash/length.

^([0-9a-fA-F]{40})\s+(\S.+?)\s+(\d+)\r*$

I did a bunch of tests in The Regex Coach and it seems to work well, and it's very similar to the original.

It will probably work for much the same reason as mine: filenames that will really occur here will have at least two characters. However, in the case of a single-character file name followed by two spaces, it will include one of the spaces in the filename. Using * after the dot would be even better.
Both your version and one with * instead also pass all the existing unit tests. But I think the * variation, which will allow a single character file name, is the best suggestion yet.

\S.? means one non-space and zero or one single character. You probably meant .*? or something, like a non-greedy zero-to-any length match.

This one allows single character filenames, at the expense of requiring a single space before/after the filename: ^([0-9a-fA-F]{40})\s(\S[\S ]*?)\s(\d+)\r*$. It also stays closer to the original's definition of what a filename is (\S) plus spaces.

I don't have a dev env setup for this project, so I'm just trying to help with regex. :) I'll probably let this be my last comment on the matter.

No, I tried to write openparen-backslash-S-dot-star-questionmark-closeparen and markdown interpreted it as a request for italics. So I edited the comment, but unfortunately you saw the original. Sorry.

If you surround your "code" with backticks, you don't have to deal with the escapes. It'll also do a mono-space font with a gray background instead of italics like you're using.

Like this `some \code here` makes it look like: some \code here

OK, so what I intended to suggest was (\S.*?), or in full ^([0-9a-fA-F]{40})\s+(\S.*?)\s+(\d+)\r*$
Thanks.

Your idea of allowing only spaces, not other whitespace, is also a good one. Could still use \s+ before and after, counting on the ? to prevent a trailing space being included in the filename. So ^([0-9a-fA-F]{40})\s+(\S.[\S ]*?)\s+(\d+)\r*$ would result...which also passes unit tests.

Yours has an error that makes it not work for single character filenames. You have an errant period inside the filename match after (\S. That requires at least 2 characters for a filename and the 2nd character can be anything, including whitespace like tabs/vertical tab/etc.

Bother. A careless mistake. The version I actually unit-tested is ^([0-9a-fA-F]{40})\s+(\S[\S ]+?)\s+(\d+)[\r]*$

That one doesn't match single character filenames either. Your [\S ]+ in the middle should be *, not +. If you use +, it requires at least one character, in addition to the \S which is another character to begin.

Must have been really sleepy yesterday. ^([0-9a-fA-F]{40})\s+(\S[\S ]*?)\s+(\d+)[\r]*$.

That one looks good. Consider it co-signed.

^([0-9a-fA-F]{40})\s+(\S[\S ]*?)\s+(\d+)[\r]*$

Now if we can only get @paulcbetts to look at this issue still open from April... :)

Got the PR ready for you @paulcbetts after @JohnThomson and I worked through the regex. I hope you're able to accept the PR.

@shiftkey duplicate of #713, this can be closed.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

patrickklaeren picture patrickklaeren  路  4Comments

Andrew-Hanlon picture Andrew-Hanlon  路  6Comments

shiftkey picture shiftkey  路  4Comments

EragonJ picture EragonJ  路  6Comments

CodeFunta picture CodeFunta  路  6Comments