Jest: Configurable diff algorithm for snapshots

Created on 2 Oct 2019  Â·  7Comments  Â·  Source: facebook/jest

🚀 Feature Proposal

I'd be able to run Jest with more human-friendly diff algorithms like 'patience' or 'histogram', for example:

jest --diff-algo patience

Maybe even better, I'd be able to "pipe" Jest output to a diff utility of my choice, for example:

jest --save-snapshots expected.txt actual.txt && code --diff expected.txt actual.txt

Screenshot 2019-10-02 at 21 03 21

There are two main advantages to this second approach:

  1. Jest doesn't need to implement all the possible algorithms
  2. I can plug in any diffing utility I like. For example, I could do this:

    • Use Git's algorithms: ... && git diff --no-index --diff-algorithm=histogram expected.txt actual.txt

    • Use Beyond Compare: ... && bcompare expected.txt actual.txt

    • Using specific utils for specific formats: json-diff, html-differ, etc.

Motivation

Currently, snapshots are diff'd using the Myers algorithm which is quite basic. I regularly encounter situations like this:

_Snapshot:_

{
  "users": [
    { "id": 1, "name": "Alice" },
    { "id": 2, "name": "Bob" },
    { "id": 3, "name": "Charlie" }
  ],
  "products": [
    { "id": 1, "name": "Product 1" },
    { "id": 2, "name": "Product 2" },
    { "id": 3, "name": "Product 3" },
    { "id": 4, "name": "Product 4" },
    { "id": 5, "name": "Product 5" }
  ]
}

_Actual:_

{
  "users": null,
  "products": [
    { "id": 1, "name": "Product 1" },
    { "id": 2, "name": "Product 2" },
    { "id": 3, "name": "Product 3" },
    { "id": 4, "name": "Product 4" },
    { "id": 5, "name": "Product 5" }
  ]
}

_Diff_:

- { "id": 1, "name": "Alice" }
+ { "id": 1, "name": "Product 1" }
...
- { "id": 2, "name": "Bob" }
+ { "id": 2, "name": "Product 2" }

Here is another example of how the diffs can be different: https://gist.github.com/roryokane/6f9061d3a60c1ba41237

Pitch

Why does this feature belong in the Jest core platform? Because user-land reporters like jest-stare (which supports side-by-side diff in HTML) only take the output of Jest, so if the primary output is not diffed smartly already, they won't help the situation too much.

Other

This feature request now includes some ideas from comments below, specifically, https://github.com/facebook/jest/issues/8998#issuecomment-537638906.

Feature Request

Most helpful comment

data-driven difference instead of serialization difference

My colleague @JanVoracek actually had the very same idea today 😄. I was initially skeptical but he explained very nicely how it would work and that it would be a big improvement over string-based diffs. So it's quite exciting to hear that you're considering it!

All 7 comments

/cc @pedrottimark

@borekb It must feel frustrating if the report looks complicated when the change is slight

A long-term goal that sounds relevant to this problem is

  • data-driven difference instead of serialization difference
  • inline snapshots with expected value of valid ECMAScript, maybe extended with JSX

Can you paste a realistic example based on JSON that you described, without any proprietary data, of course? The better the examples, the better the review when we make improvements big or small.

It also helps my thinking to understand the situation, for example: was the change an improvement to the code, a regression in the code, or a mistaken assumption in the test?

Did you have a particular package in mind for alternative comparison algorithms? If I remember correctly when I considered the git code base, its license is not compatible with MIT.

Yes, you are correct that the diff-sequences package implements Myers algorithm

Thanks for your comments, @pedrottimark.

I'm using snapshots for safe migration of API backends of our GraphQL server – I created snapshots of GraphQL responses before the migration and am using Jest diffs to find and fix the differences after the migration to the new API backend.

I no longer have that specific example but it was something along these lines:

_Snapshot:_

{
  "users": [
    { "id": 1, "name": "Alice" },
    { "id": 2, "name": "Bob" },
    { "id": 3, "name": "Charlie" }
  ],
  "products": [
    { "id": 1, "name": "Product 1" },
    { "id": 2, "name": "Product 2" },
    { "id": 3, "name": "Product 3" },
    { "id": 4, "name": "Product 4" },
    { "id": 5, "name": "Product 5" }
  ]
}

_Actual:_

{
  "users": null,
  "products": [
    { "id": 1, "name": "Product 1" },
    { "id": 2, "name": "Product 2" },
    { "id": 3, "name": "Product 3" },
    { "id": 4, "name": "Product 4" },
    { "id": 5, "name": "Product 5" }
  ]
}

This specific example is fine with Myers algorithm but something about my specific situation produced a diff like this:

- { "id": 1, "name": "Alice" }
+ { "id": 1, "name": "Product 1" }
...
- { "id": 2, "name": "Bob" }
+ { "id": 2, "name": "Product 2" }

Sorry that I don't have a specific example any longer but that actually brings me to another point:


During my work with larger diffs, I wished I could do something like this:

jest --verbose-snapshots | jest-save expected.json actual.json && code --diff expected.json actual.json

Screenshot 2019-10-02 at 21 03 21

Instead of VSCode, I would be able to use anything, from Beyond Compare to e.g.:

git diff --no-index --diff-algorithm=histogram expected.json actual.json

I'm not sure how to best implement the interaction between Jest and the rest of the world but being able to pipe _something_ into _something_ would open a lot of opportunities I think.

Now writing this down, I might actually prefer this approach over adding more diffing algorithms into Jest directly. What do you think?

Forgot to say another reason to compare the array items, object properties, and data types themselves is have more information to affect formatting of results without need for options

A work in progress right now is to reduce distracting differences from change to hierarchy (especially in markup) by comparing serialization of received object without indentation to snapshot unindented by heuristic; automatically, without need for an --ignore-indentation option

data-driven difference instead of serialization difference

My colleague @JanVoracek actually had the very same idea today 😄. I was initially skeptical but he explained very nicely how it would work and that it would be a big improvement over string-based diffs. So it's quite exciting to hear that you're considering it!

Thank you for a follow up https://github.com/facebook/jest/issues/8998#issuecomment-537638906 which is pure gold:

  • Yeah, mismatching id properties are like what diff does to braces and newlines in code
  • I like your pipeline examples and will put that approach into my mental slow cooker ;)

I've updated the OP to include the idea from https://github.com/facebook/jest/issues/8998#issuecomment-537638906, hopefully it makes sense.

Was this page helpful?
0 / 5 - 0 ratings