Describe the bug
The Resize operator fails with the following simple 2D test case.
{
"op_type": "Resize",
"mode": "linear",
"X": [[1, 1],
[1, 1]],
"scales": [2, 2],
"Y": [[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]],
"T": "float32"
},
Apparently one has to wrap the 2D tensor inside a 4D tensor with dummy 1's to get it to work, even though the two cases are equivalent o_O. (The error message is also confusing because it says it only handles 4D when it doesn't actually handle quadrilinear interpolation.)
Urgency
Vibranium preferred.
System information
To Reproduce
Run the attached model.
HRESULT=0x80004005 message=Exception during initialization: S:\WindowsAI\engine\lotus\onnxruntime\core/providers/cpu/tensor/upsample.h:75
onnxruntime::UpsampleBase::ScalesValidation scales.size() == 4 was false. Upsample: linear mode upsample only support bilinear with 4 dimension.
Expected behavior
I understand ORT's Resize only handling 2D for now (rather than full 3D or 4D which DML supports), but the validation code should just treat the squeezed dimensions all the same: [1,1,m,n], [1,m,n], [m,n], [m,n,1,1]. *assuming the corresponding scales would yield in nop's.
Additional context
Found while testing WindowsAI DML GPU vs CPU paths.
ir_version: 3
producer_name: "OnnxConformanceTest"
graph {
node {
input: "X"
input: "scales"
output: "Y"
op_type: "Resize"
attribute {
name: "mode"
s: "linear"
type: STRING
}
domain: ""
}
initializer {
dims: 2
dims: 2
data_type: FLOAT
name: "X"
raw_data: "\000\000\200?\000\000\000@\000\000@@\000\000\200@"
}
initializer {
dims: 2
data_type: FLOAT
float_data: 2
float_data: 2
name: "scales"
}
input {
name: "X"
type {
tensor_type {
elem_type: FLOAT
shape {
dim {
dim_value: 2
}
dim {
dim_value: 2
}
}
}
}
}
input {
name: "scales"
type {
tensor_type {
elem_type: FLOAT
shape {
dim {
dim_value: 2
}
}
}
}
}
output {
name: "Y"
type {
tensor_type {
elem_type: FLOAT
shape {
dim {
dim_value: 4
}
dim {
dim_value: 4
}
}
}
}
}
}
opset_import {
version: 7
}
opset_import {
version: 10
}
Hi @fdwr,
If my understanding is correct, yes, a 2D tensor needn't have to be wrapped in a 4D tensor to have to invoke "bilinear" resizing. But it is hard to make a general statement like "the validation code should just treat the squeezed dimensions all the same: [1,1,m,n]". It is conditioned upon the corresponding "scales" value as well. Only if the corresponding "scales" value is 1.0, we may amount it to "bilinear" resizing. In fact, there is no pure N-D case (N != 2) that should invoke "bilinear" interpolation. It may "boil down" to bilinear if every dimension except the innermost 2 dimensions have scale values 1.0. This is what we seem to currently support (NCHW input with scales_n = 1.0 and scales_c = 1.0).
Does this seem correct ?
Does this seem correct ?
@hariharans29 : Yep, you are correct Hari that the corresponding scales also should be a nop. We have such logic in WinML for the GPU case that checks whether the leading scales are 1's and selects a faster shader of pure bilinear resampling if it only affects the last channels, since it's faster than quadrilinear sampling and returns identical results anyway.
Given a 4D tensor, for a specific output element, we read 16 samples and compute the interpolation fractions along each dimension:
// INTERPOLATED_DIMENSION_COUNT >= 4
samples[0] = lerp(samples[0], samples[ 8], inputCoordinateFractions[0]);
samples[1] = lerp(samples[1], samples[ 9], inputCoordinateFractions[0]);
samples[2] = lerp(samples[2], samples[10], inputCoordinateFractions[0]);
samples[3] = lerp(samples[3], samples[11], inputCoordinateFractions[0]);
samples[4] = lerp(samples[4], samples[12], inputCoordinateFractions[0]);
samples[5] = lerp(samples[5], samples[13], inputCoordinateFractions[0]);
samples[6] = lerp(samples[6], samples[14], inputCoordinateFractions[0]);
samples[7] = lerp(samples[7], samples[15], inputCoordinateFractions[0]);
// INTERPOLATED_DIMENSION_COUNT >= 3
samples[0] = lerp(samples[0], samples[4], inputCoordinateFractions[1]);
samples[1] = lerp(samples[1], samples[5], inputCoordinateFractions[1]);
samples[2] = lerp(samples[2], samples[6], inputCoordinateFractions[1]);
samples[3] = lerp(samples[3], samples[7], inputCoordinateFractions[1]);
// INTERPOLATED_DIMENSION_COUNT >= 2
samples[0] = lerp(samples[0], samples[2], inputCoordinateFractions[2]);
samples[1] = lerp(samples[1], samples[3], inputCoordinateFractions[2]);
outputValue = lerp(samples[0], samples[1], inputCoordinateFractions[3]);
When the first 2 scales are 1 (meaning the fractions are either fully 0 or 1), then all those equations disappear, leaving just:
// INTERPOLATED_DIMENSION_COUNT >= 2
samples[0] = lerp(samples[0], samples[2], inputCoordinateFractions[2]);
samples[1] = lerp(samples[1], samples[3], inputCoordinateFractions[2]);
outputValue = lerp(samples[0], samples[1], inputCoordinateFractions[3]);
...which is the bilinear logic that ORT supports currently.
Hi @fdwr,
The open PR should be address your major concern - Pure 2D inputs will now be supported for the 'bilinear' case. It still supports the case of a 4D input boiling down to a 'bilinear' case (with outermost 2 scales being 1), but it stops short of adding support for the may other special N-D cases that boil down to a 'bilinear' case. So it should be a conservative middle-ground.
Thanks!
hi @hariharans29 i meet a problem when i use resize op, could you help me?
refer to issue reported in ONNX onnx/onnx#2267.