Powershell: `Compare-File` (`cmp`) command to compare binary files without loading them to RAM would be nice

Created on 16 Feb 2020  路  8Comments  路  Source: PowerShell/PowerShell

Summary of the new feature/enhancement


As a user, I need an option to compare binary files. This is possible with using cmp(1) on Unix and fc.exe on Windows. This is not very convenient as:

  1. those commands are not cross-platform, i.e. not possible to use this functionality in cross-platform scripts
  2. PowerShell overrides fc, so typing fc on Windows is not enough and it's needed to use fc.exe

Yeah, there is a way to do diff (cat file1) (cat file2), but it loads both files to RAM before comparing, this works very poor with very big files. Today I had to verify the DVD iso file and diff just didn't work, fc.exe did and was quick.

Proposed technical implementation details (optional)

I suggest adding a command that compares binary files byte-by-byte, and warns is the file contents differ. We could name it Compare-File and alias to cmp, providing a familiar name for Unix folks. This would make scripting and daily usage easier.

I'm sure, lots of scripts now use diff (cat files) approach without realizing it's an in-memory, not block-by-block comparison. New command for comparing files byte-by-byte would be very nice.

Issue-Enhancement

All 8 comments

Generally speaking we try to avoid overriding native Unix utilities with PowerShell aliases; it's been a sore point for Unix folks in the past.

I'm also not clear on how you're proposing this would work, but I have little familiarity with how the existing tools you mention work anyway. Would you be able to explain exactly what kind of output you'd expect from such a cmdlet in PowerShell? How it would display diffs, etc., especially if the diffs are very large.

Actually writing such a comparer in PS shouldn't pose a great difficulty, I wouldn't think; .NET provides pretty easy to work with APIs for interacting with files on a byte by byte basis which we can use for this.

Just looking for some clarity on what exactly you're looking for. 馃檪

  1. I agree that aliasing this to cmp is not mandatory, as user can define an alias for fc or cmp.

  2. binary compare utility doesn't need to display diffs. the main point for it is to say: 'files are identical' or 'files are not identical'. the cmp utility exits after first mismatch and reports a byte number/line where it occurred. I was thinking just about this.

    fc.exe /b though displays a diff like: 00000C74: ED 10 (address of byte, what's in file1 at this address, what's in file2), but I personally like cmp's behavior better. I think, cmp's behavior is a good starting point, if anyone will actually need a diff, it's possible to implement something like a -Diff flag later. but if not -- even without diffs this command will be good.

In most cases adding new cmdlet makes sense only if it addresses an advanced scenario and exposes some magic features. We can find many utilities (console and GUI) which do a comparison of files very well. So it is not clear why we need the new cmdlet.

@iSazonov let me repeat my point. we need it because:

  1. this functionality is useful in cross-platform PS scripts
  2. to improve user experience on Windows (it has fc.exe, but it's shadowed by ps' alias and not really easy to use because of this)

I would also appreciate a list of many console utilities that are good for this use case and cross-platform, I googled and uhm, I know nothing but cmp (which is kinda cross-platform, but relies on CygWin).

@vasily-codefresh PowerShell is "automation management". What is scenarios you want to automate in cross-platform environment? It makes no sense to reimplement every binary utility.
Compare-File means to return an object. What is the object? Or do you want only check file equality? In the case it should be Test-* cmdlet.

good point, maybe something like Test-FileContentsMatch will be better.

most useful scenario I can think of is verifying backups. let's say I backup a huge file to DVD or network location and after copying but before deleting local copy I'd like to verify it copied perfect.

@chillum Not sure on it's RAM usage, but would the existing Get-FileHash be suitable?

Get-FileHash

@mrboring thanks for the suggestion, but this is actually slower than bit-for-bit comparison.
it doesn't seem to be putting all file contents to RAM, but it puts a load on CPU and takes longer time.

besides, direct file comparison can be considered more reliable. I understand that having a SHA256 collision is highly unlikely, but why rely on 'highly unlikely' when you can just compare files bit-by-bit. even DOS 3.3 had byte-for-byte comparison (before even MD5 existed), so it's a useful functionality and should not be really hard to implement.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

alx9r picture alx9r  路  3Comments

andschwa picture andschwa  路  3Comments

aragula12 picture aragula12  路  3Comments

rkeithhill picture rkeithhill  路  3Comments

Michal-Ziemba picture Michal-Ziemba  路  3Comments