I believe the affine transforms in torchvision are parametrized improperly, due to handling shearing incorrectly.
PR #1070 contains a related discussion about the new two-axis "shear" parametrization. In this issue I'd like to focus on why even for a single axis the parametrization is slightly incorrect, then give a possible solution and a suggestion for enabling two axis shearing.
Typically we think of shear as a "tilt" of the X or Y axis that does not affect vectors orthogonal to that axis. Because of this it is common to parametrize a shear transform by that angle. We start with an X-shear matrix of the form
A = [ 1 s ]
[ 0 1 ]
The cosine of the angle phi between the Y axis and the transformed Y axis is
cos(phi) = (x^T A x) / | A x |
x = [ 0 ]
[ 1 ]
It's easy to expand this and see that
cos(phi) = 1/sqrt(1 + s^2)
which is satisfied when s = tan(phi). So the natural parametrization for an X shear is
A = [ 1 tan(phi) ]
[ 0 1 ]
The combined transform in torchvision/transforms/functional.py is determined by the RSS matrix which represents a uniform scaling by scale, a rotation by a radians and a shear of shear radians:
RSS = [ cos(a)*scale -sin(a + shear)*scale ]
[ sin(a)*scale cos(a + shear)*scale ]
This matrix should represent a shearing, scaling and rotation. Both the rotation and shearing have determinant 1, so the resulting determinant for RSS should be scale^2, however if we compute the determinant we easily see that it is
det (RSS) = scale^2 (cos(a) cos(a+shear) + sin(a)sin(a+shear) ) = scale^2 cos(shear)
where the simplification is due to the product-to-sum trig identities (cf. https://en.wikipedia.org/wiki/List_of_trigonometric_identities#Multiple-angle_formulae). This is a problem since it will _shrink_ the image when shear is large, which the user may not desire. In general, it is just not a clean parametrization of an affine transform, since for example the existing parametrization makes it impossible to generate pure shears with no rotation or scaling.
There is a simple fix in the case of a single X shear, and that is to replace
the RSS matrix above by the one derived from the parametrization in the formula
for A above. Ignoring scale we have
RSS = [ cos(a) -sin(a) ] [ 1 tan(shear) ] = [ cos(a) cos(a)tan(shear) - sin(a) ]
[ sin(a) cos(a) ] [ 0 1 ] [ sin(a) sin(a)tan(shear) + cos(a) ]
Rearranging and using the trig identities again, we arrive at a simple formula:
RSS = [ cos(a) -sin(a-shear)/cos(shear) ]
[ sin(a) cos(a-shear)/cos(shear) ]
So we see that two entries in the current released torchvision's RSS matrix just
need a division by cos(shear). It's also easy to see that this fixes the
determinant issue above.
As far as extension to two shears, we have a choice of which shear to apply
first, the X shear or Y shear. If we perform the X before Y then the rotation
and again ignoring scaling, we wind up with the following RSS matrix:
RSS = [ cos(a+shear_y)/cos(shear_y) cos(a+shear_y)tan(shear_x)/cos(shear_y) - sin(a) ]
[ sin(a+shear_y)/cos(shear_y) sin(a+shear_y)tan(shear_x)/cos(shear_y) + cos(a) ]
This might simplify a bit more, but I don't see it currently. It can be shown
pretty easily that this has determinant 1, and it's clear that setting either
shear to zero results in the proper rotation-shear matrix.
As I mentioned there are other ways to enable more general shearing, like performing the Y shear before the X, or conjugating an X shear by a rotation by some uniformly distributed angle psi in order to enable shearing along the X axis rotated by psi.
Thanks a lot for the detailed analysis!
I think your reasoning makes a lot of sense, and I'd be willing to change the behavior of the transform in torchvision to match what you described (but I haven't double-checked the computations though)
I'm ccing @vfdev-5 and @ptrblck , who have worked on this feature in the past and might have further insights.
Also, @jacobhinkle would you be willing to send a PR with the behavior you mentioned?
This would unfortunately be a BC-breaking change, so we should have a proper note in the release notes about this (and the next release will be out early next week, so this might be a change that we will have to do for the following release).
@jacobhinkle thanks for hinting on that !
Just a small comment on that, maybe we would like to parametrize both shears in degrees between -89,99 and 89,99. And we probably should take care of the case when 1/cos(shear) goes very high...
@vfdev-5 I agree shear needs to be bounded, unlike the other params. The case of shear=+-90 degrees is undefined. A ValueError could be thrown if the requested shear_x or shear_y ranges are >= 90. Shears anywhere near that bound are pretty pathological anyway and I would imagine most people using this option use shears of <10 degrees. @fmassa, happy to contribute a PR when I get a chance which will hopefully be in the next couple days.
@jacobhinkle a PR would be great!
Fixed by @pedrofreire in https://github.com/pytorch/vision/pull/1529
Most helpful comment
@vfdev-5 I agree shear needs to be bounded, unlike the other params. The case of shear=+-90 degrees is undefined. A ValueError could be thrown if the requested shear_x or shear_y ranges are >= 90. Shears anywhere near that bound are pretty pathological anyway and I would imagine most people using this option use shears of <10 degrees. @fmassa, happy to contribute a PR when I get a chance which will hopefully be in the next couple days.