Dali: Any iterator supporting multi variable-length outputs in Pytorch?

Created on 8 Jun 2020 · 17Comments · Source: NVIDIA/DALI

Hi, I met issues when using DALI to replace Pytorch dataloader. My task is to transfer mel spectrograms with several labels into training model. I firstly hope to transfer a python class with labels (including IDs and strings) and GPU tensors into pipeline, but here said ExternalSource accepts input only on the CPU (via numpy array) so I changed the data format into several numpy arrays. Then I found the iterator (DALIGenericIterator and DALIClassificationIterator) support only one or two outputs in single pipeline, but the batch in my model contains at least 6 (mel_inputs, input_lengths, mel_target, output_lengths, speaker_id, gate_padded). And each mel spectrogram has a different length.

I would like to ask how can I transfer these data into a GPU based training model? Do I need to write a custom function to support this? If yes, may I have some guidance?

Here is my code:

class ExternalInputIterator(object):
    def __init__(self, batch_size, data_folder, target_speaker):
        self.batch_size = batch_size
        self.filepaths_pair = load_files_from_path(data_folder, target_speaker)
        self.dataset_len = len(self.filepaths_pair)

    def __iter__(self):
        self.i = 0
        shuffle(self.filepaths_pair)
        return self

    def __next__(self):
        inputs = []
        if self.i >= self.dataset_len:
            raise StopIteration

        for i in range(self.batch_size):
            file = self.get_mel_pair(self.filepaths_pair[i])
            inputs.append(file)
            batch = self.collate_fn(inputs)
            self.i = (self.i + 1) % self.dataset_len
        return batch

    @property
    def size(self,):
        return self.dataset_len

    def get_mel_pair(self, files):
        ...
        return files

    def load_mel(self, filename):
        melspec = np.load(filename)
        return melspec

    def collate_fn(self, batch):
        ...
        return (mel_inputs, input_lengths, mel_targets, output_lengths, gate, input_voice, target_voice, speaker_id)

    next = __next__

class ExternalSourcePipeline(Pipeline):
    def __init__(self, batch_size, num_threads, device_id,  external_data):
        super(ExternalSourcePipeline, self).__init__(batch_size, num_threads, device_id, seed=12)
        self.mel_inputs = ops.ExternalSource()
        self.input_lengths = ops.ExternalSource()
        self.mel_targets = ops.ExternalSource()
        self.output_lengths = ops.ExternalSource()
        self.speaker_id = ops.ExternalSource()
        self.gate_padded = ops.ExternalSource()
        self.input_voice = ops.ExternalSource()
        self.target_voice = ops.ExternalSource()

        self.pad = ops.Pad(fill_value=0)
        self.gate_pad = ops.Pad(fill_value=1)
        self.external_data = external_data
        self.iterator = iter(self.external_data)

    def define_graph(self):

        self.mel_inputs = self.mel_inputs()
        # mel_inputs = self.mel_inputs
        mel_inputs = self.pad(self.mel_inputs)

        self.mel_targets = self.mel_targets()
        # mel_targets = self.mel_targets
        mel_targets = self.pad(self.mel_targets)

        self.gate_padded = self.gate_padded()
        # gate_padded = self.gate_padded
        gate_padded = self.gate_pad(self.gate_padded)

        self.input_lengths = self.input_lengths()
        self.output_lengths = self.output_lengths()
        self.speaker_id = self.speaker_id()
        self.input_voice = self.input_voice()
        self.target_voice = self.target_voice()

        return (mel_inputs, self.input_lengths, mel_targets, self.output_lengths, gate_padded,
                self.input_voice, self.target_voice, self.speaker_id)

    def iter_setup(self):
        try:
            (mel_inputs, input_lengths, mel_targets, output_lengths, gate_padded, input_voice, target_voice, speaker_id) = self.iterator.next()

            self.feed_input(self.mel_inputs, mel_inputs)
            self.feed_input(self.input_lengths, input_lengths)
            self.feed_input(self.mel_targets, mel_targets)
            self.feed_input(self.output_lengths, output_lengths)
            self.feed_input(self.speaker_id, speaker_id)
            self.feed_input(self.gate_padded, gate_padded)
            self.feed_input(self.input_voice, input_voice)
            self.feed_input(self.target_voice, target_voice)

        except StopIteration:
            self.iterator = iter(self.external_data)
            raise StopIteration

if __name__ == '__main__':

    trainset_loader = ExternalInputIterator(batch_size=4, data_folder=hparams.data_folder, target_speaker=hparams.target_speaker)
    pipe = ExternalSourcePipeline(batch_size=4, num_threads=2, device_id=0, external_data=trainset_loader)
    dali_iter = DALIGenericIterator([pipe], ['mel_inputs', 'input_lengths', 'mel_targets', 'output_lengths', 'gate_padded', 'input_voice', 'target_voice', 'speaker_id'],
                                     size=trainset_loader.size, auto_reset=True, last_batch_padded = True)
    print('dataset size:{}, batch size:{}'.format(trainset_loader.size, 4))
    for e in range(10):
        for i, data in enumerate(dali_iter):
            if i % 10 == 0:
                print('epoch {}, iteration {}'.format(e, i))
                print('mel_inputs:', data[0]['mel_inputs'].shape)
                print('input_lengths:', list(data[0]['input_lengths']))
                print('mel_targets:', data[0]['mel_targets'].shape)
                print('output_lengths:', list(data[0]['output_lengths']))
                print('speaker_id:', list(data[0]['speaker_id']))
        dali_iter.reset()

enhancement question

Source

Approximetal

All 17 comments

Hi,
ExternalSource accepts CPU only input, numpy, or anything that supports array_interface.
There is ongoing work to enable a GPU input as well https://github.com/NVIDIA/DALI/pull/1997.
Regarding the number of the DALIGenericIterator outputs, it can be any. In the example, we show how to use two, but you can add more in the same way. Just add more values in the output_map.

JanuszL on 8 Jun 2020

Hi @JanuszL, thank you for your reply. When I run the code, I got this error:

Traceback (most recent call last):
  File "/media/zzy/D/Programs/pycharm-2019.2.5/helpers/pydev/pydevd.py", line 1415, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/media/zzy/D/Programs/pycharm-2019.2.5/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/media/zzy/E/DL/torch-code/parrotron/test.py", line 15, in <module>
    size=trainset_loader.size, auto_reset=True, fill_last_batch = True, last_batch_padded = False)
  File "/home/zzy/anaconda3/envs/StarGAN-VC/lib/python3.6/site-packages/nvidia/dali/plugin/pytorch.py", line 162, in __init__
    self._first_batch = self.next()
  File "/home/zzy/anaconda3/envs/StarGAN-VC/lib/python3.6/site-packages/nvidia/dali/plugin/pytorch.py", line 259, in next
    return self.__next__()
  File "/home/zzy/anaconda3/envs/StarGAN-VC/lib/python3.6/site-packages/nvidia/dali/plugin/pytorch.py", line 190, in __next__
    category_tensors[category] = out.as_tensor()
RuntimeError: [/opt/dali/dali/pipeline/data/tensor_list.h:435] Assert on "this->IsDenseTensor()" failed: All tensors in the input TensorList must have the same shape and be densely packed.

In a batch the data shape is like np.array([[80 * 218], [80 * 156], [80 * 131], [80 * 109]]). It seems the iterator doesn't support variable-length data. I tried to use self.pad = ops.Pad(fill_value=0), but I got another error:

RuntimeError: Critical error in pipeline: [/opt/dali/dali/operators/generic/pad.cc:172] Unsupported data type: 10
...
Current pipeline object is no longer valid.

Approximetal on 8 Jun 2020

That is true, DALI doesn't support variable-length data in PyTorch. However, PaddlPaddle and MXNet (Gluon) have such support. TensorFlow supports this only for CPU.

JanuszL on 8 Jun 2020

Another question is ops.Pad function cannot be used to pad the variable-length data. As describe above:

In a batch the data shape is like np.array([[80 * 218], [80 * 156], [80 * 131], [80 * 109]]). I tried to use self.pad = ops.Pad(fill_value=0), but I got another error:

RuntimeError: Critical error in pipeline: [/opt/dali/dali/operators/generic/pad.cc:172] Unsupported data type: 10
...
Current pipeline object is no longer valid.

What's the reason for this error? @JanuszL

Approximetal on 9 Jun 2020

It seems that you are trying to pad on the datatype that is not supported by the pad operator.
@jantonguirao any thoughts?

JanuszL on 9 Jun 2020

@Approximetal Most of our operators don't support float64. I'd recommend using 32-bit floats (float) instead to feed your external source inputs

jantonguirao on 9 Jun 2020

Hi @jantonguirao, I checked the data, it is float32, the format is like this [np.array(shape=(80,216)), np.array(shape=(80,209)), np.array(shape=(80,193)), np.array(shape=(80,116))]
I use pad function self.pad = ops.Pad(fill_value=0, axes=(2,)) it still return the error above. Is there any issue in my code?

The Documentation said the parameter is data (TensorList) – Input to the operator. but the error shows TypeError: expected np.ndarray. Is it confusing?

Approximetal on 10 Jun 2020

@Approximetal The error message you shared with us before points to the fact that the input of Pad operator was in float64 format, which is also the default in numpy:

>>> arr = np.zeros(shape=(2, 2))
>>> print(arr.dtype)
float64

>>> arr = np.zeros(shape=(2, 2), dtype=np.float32)
>>> print(arr.dtype)
float32

The other TypeError message is probably not coming from Pad operator but from somewhere else.

If you are able to share a reproducible code sample (with some sample data) we could analyze and find what's the problem.

jantonguirao on 10 Jun 2020

@Approximetal

The Documentation said the parameter is data (TensorList) – Input to the operator. but the error shows TypeError: expected np.ndarray. Is it confusing?

Do you refer to:

RuntimeError: Critical error in pipeline: [/opt/dali/dali/operators/generic/pad.cc:172] Unsupported data type: 10
...
Current pipeline object is no longer valid.

error?

JanuszL on 10 Jun 2020

@Approximetal

The Documentation said the parameter is data (TensorList) – Input to the operator. but the error shows TypeError: expected np.ndarray. Is it confusing?

Do you refer to:
RuntimeError: Critical error in pipeline: [/opt/dali/dali/operators/generic/pad.cc:172] Unsupported data type: 10
...
Current pipeline object is no longer valid.
error?

Not this one, I mean, I don't know why the pad function cannot use, so I checked the documentation, and it said the data format should be a TensorList, So I use torch.from_numpy to transfer the data then got TypeError: expected np.ndarray. Maybe I just misunderstood.

Approximetal on 10 Jun 2020

I don't get it, torch.from_numpy creates a Torch tensor and DALI cannot handle it as an input, maybe you want to use .numpy method?

JanuszL on 10 Jun 2020

I don't get it, torch.from_numpy creates a Torch tensor and DALI cannot handle it as an input, maybe you want to use .numpy method?

Just want to confirm the
description in the documentation means a list of numpy array right? If DALI can only handle numpy array, that's okay. I just misunderstood, nothing matters, thanks reply~

Approximetal on 10 Jun 2020

Sure, it can be either a list of batch size numpy arrays, where each array corresponds to one tensor, or one nympy array where outermost dimension corresponds to the batch size.

JanuszL on 10 Jun 2020

@Approximetal The error message you shared with us before points to the fact that the input of Pad operator was in float64 format, which is also the default in numpy:
>>> arr = np.zeros(shape=(2, 2))
>>> print(arr.dtype)
float64
vs
>>> arr = np.zeros(shape=(2, 2), dtype=np.float32)
>>> print(arr.dtype)
float32
The other TypeError message is probably not coming from Pad operator but from somewhere else.

If you are able to share a reproducible code sample (with some sample data) we could analyze and find what's the problem.

@jantonguirao I found the dtype of the id info is int64, I changed it to float32, and it works. Thanks~

Approximetal on 10 Jun 2020

👍1

Hi, I met a new error RuntimeError: [/opt/dali/dali/python/backend_impl.cc:138] Assert on "info.strides[i] == info.itemsize*dim_prod" failed: Strided data not supported. Detected on dimension 1

The code is like:
input_voice = np.concatenate([frame for frame, _ in input_voice], axis=0)
where frame is a shape=[60,80] numpy array, input_voice is a concatenated numpy array with float32.

What does this error mean? How can I change the data format?

Approximetal on 10 Jun 2020

It looks that your NumPy array has strides and the memory is not contiguous. You can try to copy the array and see if the strides have been changes, see more info in for example this place.

JanuszL on 10 Jun 2020

👍1

It looks that your NumPy array has strides and the memory is not contiguous. You can try to copy the array and see if the strides have been changes, see more info in for example this place.

I use np.ascontiguousarray() to solve the problem.

Approximetal on 10 Jun 2020

Was this page helpful?

0 / 5 - 0 ratings