Vision: Shear parametrized improperly

Created on 31 Jul 2019  路  5Comments  路  Source: pytorch/vision

I believe the affine transforms in torchvision are parametrized improperly, due to handling shearing incorrectly.

PR #1070 contains a related discussion about the new two-axis "shear" parametrization. In this issue I'd like to focus on why even for a single axis the parametrization is slightly incorrect, then give a possible solution and a suggestion for enabling two axis shearing.

Typically we think of shear as a "tilt" of the X or Y axis that does not affect vectors orthogonal to that axis. Because of this it is common to parametrize a shear transform by that angle. We start with an X-shear matrix of the form

A = [ 1 s ]
    [ 0 1 ]

The cosine of the angle phi between the Y axis and the transformed Y axis is

cos(phi) = (x^T A x) / | A x |
x = [ 0 ]
    [ 1 ]

It's easy to expand this and see that

cos(phi) = 1/sqrt(1 + s^2)

which is satisfied when s = tan(phi). So the natural parametrization for an X shear is

A = [ 1  tan(phi) ]
    [ 0       1   ]

The combined transform in torchvision/transforms/functional.py is determined by the RSS matrix which represents a uniform scaling by scale, a rotation by a radians and a shear of shear radians:

RSS = [ cos(a)*scale    -sin(a + shear)*scale ]
      [ sin(a)*scale     cos(a + shear)*scale ]

This matrix should represent a shearing, scaling and rotation. Both the rotation and shearing have determinant 1, so the resulting determinant for RSS should be scale^2, however if we compute the determinant we easily see that it is

det (RSS) = scale^2 (cos(a) cos(a+shear) + sin(a)sin(a+shear) ) = scale^2 cos(shear)

where the simplification is due to the product-to-sum trig identities (cf. https://en.wikipedia.org/wiki/List_of_trigonometric_identities#Multiple-angle_formulae). This is a problem since it will _shrink_ the image when shear is large, which the user may not desire. In general, it is just not a clean parametrization of an affine transform, since for example the existing parametrization makes it impossible to generate pure shears with no rotation or scaling.

There is a simple fix in the case of a single X shear, and that is to replace
the RSS matrix above by the one derived from the parametrization in the formula
for A above. Ignoring scale we have

RSS = [ cos(a)  -sin(a) ] [ 1  tan(shear) ] = [ cos(a)  cos(a)tan(shear) - sin(a) ]
      [ sin(a)   cos(a) ] [ 0       1     ]   [ sin(a)  sin(a)tan(shear) + cos(a) ]

Rearranging and using the trig identities again, we arrive at a simple formula:

RSS = [ cos(a)   -sin(a-shear)/cos(shear) ]
      [ sin(a)    cos(a-shear)/cos(shear) ]

So we see that two entries in the current released torchvision's RSS matrix just
need a division by cos(shear). It's also easy to see that this fixes the
determinant issue above.

As far as extension to two shears, we have a choice of which shear to apply
first, the X shear or Y shear. If we perform the X before Y then the rotation
and again ignoring scaling, we wind up with the following RSS matrix:

RSS = [ cos(a+shear_y)/cos(shear_y)  cos(a+shear_y)tan(shear_x)/cos(shear_y) - sin(a) ]
      [ sin(a+shear_y)/cos(shear_y)  sin(a+shear_y)tan(shear_x)/cos(shear_y) + cos(a) ]

This might simplify a bit more, but I don't see it currently. It can be shown
pretty easily that this has determinant 1, and it's clear that setting either
shear to zero results in the proper rotation-shear matrix.

As I mentioned there are other ways to enable more general shearing, like performing the Y shear before the X, or conjugating an X shear by a rotation by some uniformly distributed angle psi in order to enable shearing along the X axis rotated by psi.

bug help wanted transforms

Most helpful comment

@vfdev-5 I agree shear needs to be bounded, unlike the other params. The case of shear=+-90 degrees is undefined. A ValueError could be thrown if the requested shear_x or shear_y ranges are >= 90. Shears anywhere near that bound are pretty pathological anyway and I would imagine most people using this option use shears of <10 degrees. @fmassa, happy to contribute a PR when I get a chance which will hopefully be in the next couple days.

All 5 comments

Thanks a lot for the detailed analysis!

I think your reasoning makes a lot of sense, and I'd be willing to change the behavior of the transform in torchvision to match what you described (but I haven't double-checked the computations though)

I'm ccing @vfdev-5 and @ptrblck , who have worked on this feature in the past and might have further insights.

Also, @jacobhinkle would you be willing to send a PR with the behavior you mentioned?

This would unfortunately be a BC-breaking change, so we should have a proper note in the release notes about this (and the next release will be out early next week, so this might be a change that we will have to do for the following release).

@jacobhinkle thanks for hinting on that !
Just a small comment on that, maybe we would like to parametrize both shears in degrees between -89,99 and 89,99. And we probably should take care of the case when 1/cos(shear) goes very high...

@vfdev-5 I agree shear needs to be bounded, unlike the other params. The case of shear=+-90 degrees is undefined. A ValueError could be thrown if the requested shear_x or shear_y ranges are >= 90. Shears anywhere near that bound are pretty pathological anyway and I would imagine most people using this option use shears of <10 degrees. @fmassa, happy to contribute a PR when I get a chance which will hopefully be in the next couple days.

@jacobhinkle a PR would be great!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

carlocab picture carlocab  路  3Comments

a-maci picture a-maci  路  3Comments

datumbox picture datumbox  路  3Comments

ibtingzon picture ibtingzon  路  3Comments

alpha-gradient picture alpha-gradient  路  3Comments