Current way of transformations testing is not satisfactory due random configurations, duplicated code and unclear structure (test_functional_tensor.py, test_transforms.py, test_transforms_tensor.py). This feature request proposes to rewrite transforms tests in order to takle these problems.
Structured approach for transforms tests provides :
torchvision.transforms.functional)torchvision.transforms)pytest.parametrize can not be used due to certain internal reasons.Common part of testing a transformation we can defined in special class and derived classes could configure input type etc.
Example from the referenced PR:
class RoIOpTester:
# Defines functional tests to execute
# on cpu, on cuda, other options etc
class RoIPoolTester(RoIOpTester, unittest.TestCase):
def fn(self, *args, **kwarsg):
# function to test
def get_script_fn(self, *agrs, **kwargs):
# scripted function
def expected_fn(self, *agrs, **kwargs):
# reference function
torch.testing._internalfrom torch.testing._internal.common_utils import TestCase, run_tests
from torch.testing._internal.common_device_type import dtypes, dtypesIfCUDA, instantiate_device_type_tests
from torch.testing import floating_types, floating_types_and_half, integral_types
class Tester(TestCase):
@dtypes(*(floating_types() + integral_types()))
@dtypesIfCUDA(*(floating_types_and_half() + integral_types()))
def test_resize(self, device, dtype):
img = self.create_image(h=12, w=16, device=device, dtype=dtype)
...
instantiate_device_type_tests(Tester, globals())
if __name__ == '__main__':
run_tests()
this gives
TesterCPU::test_resize_cpu_float32 PASSED [ 14%]
TesterCPU::test_resize_cpu_float64 PASSED [ 28%]
TesterCPU::test_resize_cpu_int16 PASSED [ 42%]
TesterCPU::test_resize_cpu_int32 PASSED [ 57%]
TesterCPU::test_resize_cpu_int64 PASSED [ 71%]
TesterCPU::test_resize_cpu_int8 PASSED [ 85%]
TesterCPU::test_resize_cpu_uint8 PASSED [100%]
Problems:
dtypes is perfect for torch.Tensor and no simple way to add PIL as dtypePackaged looks promising and could potentially solve the limitation of pytest (to confirm by fb).
import unittest
from torch.testing import floating_types, integral_types
from parameterized import parameterized
class Tester(unittest.TestCase):
@parameterized.expand(
[("cuda", dt) for dt in floating_types() + integral_types()] +
[("cpu", dt) for dt in floating_types() + integral_types()]
)
def test_resize(self, device, dtype):
pass
if __name__ == "__main__":
unittest.main()
this gives
TestMathUnitTest::test_resize_00_cuda PASSED [ 7%]
TestMathUnitTest::test_resize_01_cuda PASSED [ 14%]
TestMathUnitTest::test_resize_02_cuda PASSED [ 21%]
TestMathUnitTest::test_resize_03_cuda PASSED [ 28%]
TestMathUnitTest::test_resize_04_cuda PASSED [ 35%]
TestMathUnitTest::test_resize_05_cuda PASSED [ 42%]
TestMathUnitTest::test_resize_06_cuda PASSED [ 50%]
TestMathUnitTest::test_resize_07_cpu PASSED [ 57%]
TestMathUnitTest::test_resize_08_cpu PASSED [ 64%]
TestMathUnitTest::test_resize_09_cpu PASSED [ 71%]
TestMathUnitTest::test_resize_10_cpu PASSED [ 78%]
TestMathUnitTest::test_resize_11_cpu PASSED [ 85%]
TestMathUnitTest::test_resize_12_cpu PASSED [ 92%]
TestMathUnitTest::test_resize_13_cpu PASSED [100%]
Problems:
Split test into 3 files, for example
We can similarly put a file for PIL input, e.g. torchscript_consistency_pil_test.py
img = ...
ref_fn = ...
for config in test_configs:
output = fn(img, **config)
true_output = ref_fn(img, **config)
self.assertEqual(output, true_output)
Recent bugs (e.g https://github.com/pytorch/vision/pull/2869) show unsatisfactory code coverage for transforms.
cc @vfdev-5 @fmassa @datumbox @mthrok
Thanks for the detailed issue!
A few thoughts:
check result correctness vs stored or precomputed reference
For correctness check, I think it would be preferable if possible to avoid having stored files with expected results, as this is very opaque and might make it more difficult to make modifications or improvements.
My first idea would be to have have some simple constructed images that we can test for correctness. For example, if we have a function like draw_circle_at or draw_rectangle_at that draws a simple white shape on a black background, we can by definition have the ground-truth for many operations via some math.
accimage?
I think accimage is less of a priority. It's not maintained and we can probably integrate any potential optimization that is present there with ipp either in torchvision or in PyTorch (only exposed ops are resize and flip, and we should benchmark to see how ipp fares compared to the current CPU implementations in PyTorch).
How to do that?
I think I might be leaning towards the first two options, with maybe a preference for option 2.
Wrt the limitations of dtypes, its implementation in PyTorch is fairly small and we could most probably adapt it to our own needs.
How to do operation's configuration injection ?
I think and approach similar to the custom @dtypes would work here, and is a subset of what parametrize does. So you could do
@configurations([1, 2, 3])
def test_1(self, config):
output = fn(**config)
...
+1 on using parameterized or a similar solution that supports annotations.
Right now we often use complex loops/if-statements in unit-testing. This is generally considered a bad practice and it makes it very difficult for us to debug/pinpoint problems.
Here is an example of a test that I wrote in the past that has this issue:
https://github.com/pytorch/vision/blob/005355bd6fdc3a45f4d54f8d8dfd035b7968ce64/test/test_ops.py#L531-L533
It became quite hard to pinpoint where the problem was when in failed in a recent PR.
I love the direction this RFC is going.
One thing to note regarding the split of devices. The reason why I have decided to split GPU/CUDA tests of torchaudio into separate files is fbcode. CUDA tests cannot run in fbcode, and skipping them constantly will trigger alerts. There is no way in fbcode to disable such alarm because from the viewpoint of fbcode, there is no point of having a test that is constantly skipped. PyTorch-core considers it fine to receive alerts and decided to ignore them (or probably because it's too much work to do such migration), but since torchaudio is small and new, I decided to split them into separate files and run only CPU tests in fbcode.
I do no know if the same thing can be accomplished with PyTorch's test utilities. I did not play around with it.
FYI: Using torch.testing._internal to support PIL and configurations would require to recode existing classes and methods:
vision_dtypes derived from torch.testing._internal.common_device_type.dtypes (easy)
code and usage
import torch
from torch.testing._internal.common_device_type import dtypes
class vision_dtypes(dtypes):
def __init__(self, *args, **kwargs):
super().__init__()
if len(args) > 0 and isinstance(args[0], (list, tuple)):
for arg in args:
assert isinstance(arg, (list, tuple)), \
"When one dtype variant is a tuple or list, " \
"all dtype variants must be. " \
"Received non-list non-tuple dtype {0}".format(str(arg))
assert all(self._is_supported_dtype(dtype) for dtype in arg), "Unknown dtype in {0}".format(str(arg))
else:
assert all(self._is_supported_dtype(arg) for arg in args), "Unknown dtype in {0}".format(str(args))
self.args = args
self.device_type = kwargs.get('device_type', 'all')
@staticmethod
def _is_supported_dtype(dtype):
if isinstance(dtype, torch.dtype):
return True
if isinstance(dtype, str):
# https://pillow.readthedocs.io/en/stable/handbook/concepts.html?highlight=modes#modes
modes = ["1", "L", "P", "RGB", "RGBA", "CMYK", "YCbCr", "LAB", "HSV", "I", "F"]
return dtype in ["PIL.{}".format(mode) for mode in modes]
return False
and this can be used as
class Tester(TestCase):
@vision_dtypes(*(floating_types() + integral_types() + ("PIL.RGB", "PIL.P", "PIL.F")))
@dtypesIfCUDA(*(floating_types_and_half() + integral_types()))
def test_resize(self, device, dtype, config=None):
# img = self.create_image(h=12, w=16, device=device, dtype=dtype)
pass
instantiate_device_type_tests(Tester, globals())
@configurations decorator, we have to copy and adapt instantiate_device_type_tests method and rewrite DeviceTypeTestBase.instantiate_test method to handle configurations similarly to how it is done for @ops:After discussing with @fmassa, we can propose a change to torch.testing._internal in order to enable a more flexible approach: a) user can easily add "custom" dtypes (e.g. like cpu PIL modes for torchvision), b) user can easily create decorators (something like @ops) to insert a parameterization/configuration as arg to the test.
Next step is to create a prototype of new common_device_type.py keeping BC and covering above features.
cc @mruberry @mthrok
@vfdev-5 Would you just like an @variants(iterable of (str, obj) pairs) decorator? That instantiated subtests with names from the string and passed the values in obj to those tests?
@mruberry yes, something like that would be perfect. The usage will be like that
class Tester(TestCase):
@variants([
("nearest_single_value1", {"size": 32, "interpolation": 0}),
("nearest_single_value2", {"size": 25, "interpolation": 0}),
("linear_single_value1", {"size": 32, "interpolation": 2}),
("linear_single_value2", {"size": 25, "interpolation": 2}),
])
@vision_dtypes(*(floating_types() + integral_types() + ("PIL.RGB", "PIL.P", "PIL.F")))
@dtypesIfCUDA(*(floating_types_and_half() + integral_types()))
def test_resize(self, device, dtype, config):
# img = self.create_image(h=12, w=16, device=device, dtype=dtype)
# config = {"size": X, "interpolation": Y}
pass
# instantiate_device_type_tests(Tester, globals())
instantiate_tests(Tester, globals())
>
TesterCPU::test_resize_cpu_float32_nearest_single_value1
TesterCPU::test_resize_cpu_float32_nearest_single_value2
TesterCPU::test_resize_cpu_float32_linear_single_value1
TesterCPU::test_resize_cpu_float32_linear_single_value2
Instead of being passed in "config" how do you feel about taking kwargs so the variant params can get splatted?
Instead of being passed in "config" how do you feel about taking kwargs so the variant params can get splatted?
I think this could also work.
Actually, I've been thinking that it would be nice that something generic like variants could be used for any "cross-product" combinations (without restrictions on using variants decorator only once):
class Tester(TestCase):
@variants("interpolation", [0, 2, 3])
@variants("size", [32, 26, (32, 26), [26, 32]])
@variants("dtype", [floating_types() + integral_types() + ...])
@variants("device", ["cpu", "cuda"])
def test_resize(self, interpolation, size, dtype, device):
# img = self.create_image(h=12, w=16, device=device, dtype=dtype)
pass
ParametrizedTester::test_resize_cpu_float32_size_0_interpolation_0
ParametrizedTester::test_resize_cpu_float32_size_0_interpolation_1
ParametrizedTester::test_resize_cpu_float32_size_0_interpolation_2
ParametrizedTester::test_resize_cpu_float32_size_1_interpolation_0
ParametrizedTester::test_resize_cpu_float32_size_1_interpolation_1
...
We can still use @dtypes for device, data type configuration, but having multiple @variants could be nice.
What do you think ?
I agree that would be nice. I'm going to propose updates to PyTorch's test tools soon and they'll include a decorator like parameterize or variants or subtest functionality. Note that a workaround today is just to write a for loop that enumerates the cross-product of these arguments in the test itself.
I'm going to propose updates to PyTorch's test tools soon and they'll include a decorator like parameterize or variants or subtest functionality.
@mruberry sounds great ! Also, it would be nice to have related classes and methods more configurable. For example, if we would like to derive from dtypes, it would be nice to refactor type verification and have a possibility to override it:
# Changed dtypes
class dtypes(object):
# Note: *args, **kwargs for Python2 compat.
# Python 3 allows (self, *args, device_type='all').
def __init__(self, *args, **kwargs):
if len(args) > 0 and isinstance(args[0], (list, tuple)):
for arg in args:
assert isinstance(arg, (list, tuple)), \
"When one dtype variant is a tuple or list, " \
"all dtype variants must be. " \
"Received non-list non-tuple dtype {0}".format(str(arg))
# This is the change vs original dtypes implementation
assert all(self._is_supported_dtype(dtype) for dtype in arg), "Unknown dtype in {0}".format(str(arg))
else:
# This is the change vs original dtypes implementation
assert all(self._is_supported_dtype(arg) for arg in args), "Unknown dtype in {0}".format(str(args))
self.args = args
self.device_type = kwargs.get('device_type', 'all')
def __call__(self, fn):
d = getattr(fn, 'dtypes', {})
assert self.device_type not in d, "dtypes redefinition for {0}".format(self.device_type)
d[self.device_type] = self.args
fn.dtypes = d
return fn
# This is the change vs original dtypes implementation
@staticmethod
def _is_supported_dtype(dtype):
if isinstance(dtype, torch.dtype):
return True
return False
I was also thinking to implement variants decorator and propose certain refactorization changes for dtypes, DeviceTypeTestBase and instantiate_device_type_tests. Let me know if it could overlap with your planned updates for PyTorch's test tools.
Changes like that don't need to go through a special planning process, although I am curious if @dtypes is really the best place for a change like that and how to use the new functionality would need to be documented. It's too bad that subclassing torch.dtype isn't possible.
Most helpful comment
I love the direction this RFC is going.
One thing to note regarding the split of devices. The reason why I have decided to split GPU/CUDA tests of torchaudio into separate files is fbcode. CUDA tests cannot run in fbcode, and skipping them constantly will trigger alerts. There is no way in fbcode to disable such alarm because from the viewpoint of fbcode, there is no point of having a test that is constantly skipped. PyTorch-core considers it fine to receive alerts and decided to ignore them (or probably because it's too much work to do such migration), but since torchaudio is small and new, I decided to split them into separate files and run only CPU tests in fbcode.
I do no know if the same thing can be accomplished with PyTorch's test utilities. I did not play around with it.