I'm trying to create a standardized interface to use aDALIClassificationIterator with pytorch and have something like this:
from copy import deepcopy
class DALIClassificationIteratorLikePytorch(DALIClassificationIterator):
def __next__(self):
"""Override this to return things like pytorch."""
if self._first_batch is None:
print("first batch")
return super(DALIClassificationIteratorLikePytorch, self).__next__()
sample = super(DALIClassificationIteratorLikePytorch, self).__next__()
if sample is not None and len(sample) > 0:
images = deepcopy(sample[0]["data"])
labels = deepcopy(sample[0]["label"])
# images = sample[0]["data"]
# labels = sample[0]["label"]
print("returning!")
return images, labels
I seem to be seeing an issue where the logic for self._first_batch gets triggered multiple times causing issues downstream. Under what scenarios is DALIClassificationIterator._first_batch used for pytorch? Looking at the code it doesn't seem to have a function.
Hi,
__init__ calls next self.next() to set self._first_batch, self._first_batch it is None so self.next() return a value and init assigns it to self._first_batch.
Next call to self.next() returns value from self._first_batch and sets it to None.
The third call to self.next() computes the next value and returns it.
I don't think you need a special code path for self._first_batch is None in your code. DALIClassificationIterator will handle it.
Thanks for the quick response, that was the first thing I tried, i.e.:
class DALIClassificationIteratorLikePytorch(DALIClassificationIterator):
def __next__(self):
"""Override this to return things like pytorch."""
sample = super(DALIClassificationIteratorLikePytorch, self).__next__()
if sample is not None and len(sample) > 0:
images = deepcopy(sample[0]["data"])
labels = deepcopy(sample[0]["label"])
print("returning!")
return images, labels
But this triggers an error:
dali_imagefolder.py", line 209, in __next__
images = sample[0]["data"]
TypeError: new(): invalid data type 'str'
But I didn't really look closely enough I guess about the alternation. It is now fixed! Code below for working example:
class DALIClassificationIteratorLikePytorch(DALIClassificationIterator):
def __next__(self):
"""Override this to return things like pytorch."""
sample = super(DALIClassificationIteratorLikePytorch, self).__next__()
if sample is not None and len(sample) > 0:
if isinstance(sample[0], dict):
images = sample[0]["data"]
labels = sample[0]["label"]
else:
images, labels = sample
return images, labels
Thanks again.
But this triggers an error:
dali_imagefolder.py", line 209, in __next__ images = sample[0]["data"] TypeError: new(): invalid data type 'str'
Because init call your next which removes dicts, and the self._first_batch stores plain images, labels pair.
Yup makes sense, thanks! Missed that alternating of _first_batch and batch.