Onnxruntime: Resize operator fails with 2D tensors

Created on 17 Aug 2019 · 4Comments · Source: microsoft/onnxruntime

Describe the bug
The Resize operator fails with the following simple 2D test case.

{
  "op_type": "Resize",
  "mode": "linear",
  "X": [[1, 1],
        [1, 1]],
  "scales": [2, 2],
  "Y": [[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]],
  "T": "float32"
},

Apparently one has to wrap the 2D tensor inside a 4D tensor with dummy 1's to get it to work, even though the two cases are equivalent o_O. (The error message is also confusing because it says it only handles 4D when it doesn't actually handle quadrilinear interpolation.)

Urgency
Vibranium preferred.

System information

OS Platform and Distribution: Windows 10 Vibranium
ONNX Runtime installed from: source
ONNX Runtime version: engine/lotus (v0.1.4-709-gbf6f19c6)
Python version: NA
Visual Studio version (if applicable): Visual Studio 2017
GCC/Compiler version (if compiling from source): NA
CUDA/cuDNN version: NA
GPU model and memory: AMD Radeon, 32GB

To Reproduce
Run the attached model.

HRESULT=0x80004005 message=Exception during initialization: S:\WindowsAI\engine\lotus\onnxruntime\core/providers/cpu/tensor/upsample.h:75
 onnxruntime::UpsampleBase::ScalesValidation scales.size() == 4 was false. Upsample: linear mode upsample only support bilinear with 4 dimension.

Expected behavior
I understand ORT's Resize only handling 2D for now (rather than full 3D or 4D which DML supports), but the validation code should just treat the squeezed dimensions all the same: [1,1,m,n], [1,m,n], [m,n], [m,n,1,1]. *assuming the corresponding scales would yield in nop's.

Additional context
Found while testing WindowsAI DML GPU vs CPU paths.

ir_version: 3
producer_name: "OnnxConformanceTest"
graph {
  node {
    input: "X"
    input: "scales"
    output: "Y"
    op_type: "Resize"
    attribute {
      name: "mode"
      s: "linear"
      type: STRING
    }
    domain: ""
  }
  initializer {
    dims: 2
    dims: 2
    data_type: FLOAT
    name: "X"
    raw_data: "\000\000\200?\000\000\000@\000\000@@\000\000\200@"
  }
  initializer {
    dims: 2
    data_type: FLOAT
    float_data: 2
    float_data: 2
    name: "scales"
  }
  input {
    name: "X"
    type {
      tensor_type {
        elem_type: FLOAT
        shape {
          dim {
            dim_value: 2
          }
          dim {
            dim_value: 2
          }
        }
      }
    }
  }
  input {
    name: "scales"
    type {
      tensor_type {
        elem_type: FLOAT
        shape {
          dim {
            dim_value: 2
          }
        }
      }
    }
  }
  output {
    name: "Y"
    type {
      tensor_type {
        elem_type: FLOAT
        shape {
          dim {
            dim_value: 4
          }
          dim {
            dim_value: 4
          }
        }
      }
    }
  }
}
opset_import {
  version: 7
}
opset_import {
  version: 10
}

bug

Source

fdwr

All 4 comments

Hi @fdwr,

If my understanding is correct, yes, a 2D tensor needn't have to be wrapped in a 4D tensor to have to invoke "bilinear" resizing. But it is hard to make a general statement like "the validation code should just treat the squeezed dimensions all the same: [1,1,m,n]". It is conditioned upon the corresponding "scales" value as well. Only if the corresponding "scales" value is 1.0, we may amount it to "bilinear" resizing. In fact, there is no pure N-D case (N != 2) that should invoke "bilinear" interpolation. It may "boil down" to bilinear if every dimension except the innermost 2 dimensions have scale values 1.0. This is what we seem to currently support (NCHW input with scales_n = 1.0 and scales_c = 1.0).

Does this seem correct ?

hariharans29 on 21 Aug 2019

Does this seem correct ?

@hariharans29 : Yep, you are correct Hari that the corresponding scales also should be a nop. We have such logic in WinML for the GPU case that checks whether the leading scales are 1's and selects a faster shader of pure bilinear resampling if it only affects the last channels, since it's faster than quadrilinear sampling and returns identical results anyway.

Given a 4D tensor, for a specific output element, we read 16 samples and compute the interpolation fractions along each dimension:

// INTERPOLATED_DIMENSION_COUNT >= 4
samples[0] = lerp(samples[0], samples[ 8], inputCoordinateFractions[0]);
samples[1] = lerp(samples[1], samples[ 9], inputCoordinateFractions[0]);
samples[2] = lerp(samples[2], samples[10], inputCoordinateFractions[0]);
samples[3] = lerp(samples[3], samples[11], inputCoordinateFractions[0]);
samples[4] = lerp(samples[4], samples[12], inputCoordinateFractions[0]);
samples[5] = lerp(samples[5], samples[13], inputCoordinateFractions[0]);
samples[6] = lerp(samples[6], samples[14], inputCoordinateFractions[0]);
samples[7] = lerp(samples[7], samples[15], inputCoordinateFractions[0]);

// INTERPOLATED_DIMENSION_COUNT >= 3
samples[0] = lerp(samples[0], samples[4], inputCoordinateFractions[1]);
samples[1] = lerp(samples[1], samples[5], inputCoordinateFractions[1]);
samples[2] = lerp(samples[2], samples[6], inputCoordinateFractions[1]);
samples[3] = lerp(samples[3], samples[7], inputCoordinateFractions[1]);

// INTERPOLATED_DIMENSION_COUNT >= 2
samples[0] = lerp(samples[0], samples[2], inputCoordinateFractions[2]);
samples[1] = lerp(samples[1], samples[3], inputCoordinateFractions[2]);

outputValue = lerp(samples[0], samples[1], inputCoordinateFractions[3]);

When the first 2 scales are 1 (meaning the fractions are either fully 0 or 1), then all those equations disappear, leaving just:

// INTERPOLATED_DIMENSION_COUNT >= 2
samples[0] = lerp(samples[0], samples[2], inputCoordinateFractions[2]);
samples[1] = lerp(samples[1], samples[3], inputCoordinateFractions[2]);

outputValue = lerp(samples[0], samples[1], inputCoordinateFractions[3]);

...which is the bilinear logic that ORT supports currently.

fdwr on 22 Aug 2019

👍1

Hi @fdwr,

The open PR should be address your major concern - Pure 2D inputs will now be supported for the 'bilinear' case. It still supports the case of a 4D input boiling down to a 'bilinear' case (with outermost 2 scales being 1), but it stops short of adding support for the may other special N-D cases that boil down to a 'bilinear' case. So it should be a conservative middle-ground.

Thanks!

hariharans29 on 24 Aug 2019

👍1

hi @hariharans29 i meet a problem when i use resize op, could you help me?
refer to issue reported in ONNX onnx/onnx#2267.