Our mnist smokescreen tests are breaking in the latest build (mxnet-1.3.0b20180911) as a result of this PR (#12285) with an index out of range error.
Breaks on both Linux and OSX. Build mxnet-1.3.0b20180909 is fine, 1.3.0b20180911 is faulty.
Can you provide more details? For example, length of the NDArray, batch size?
Interested to understand the issue with an example. If this is a critical issue and takes time to fix, we can revert that commit till we root cause the issue.
@sandeep-krishnamurthy @zhreshold
Here's a specific example from the Nightly Binary test which just failed:
[StraightDope: Python2 Single-GPU]
[StraightDope: Python2 Single-GPU]
[StraightDope: Python2 Single-GPU] IndexErrorTraceback (most recent call last)
[StraightDope: Python2 Single-GPU] <ipython-input-8-d40071ee971d> in <module>()
[StraightDope: Python2 Single-GPU] 20 train_data.reset()
[StraightDope: Python2 Single-GPU] 21 iter = 0
[StraightDope: Python2 Single-GPU] ---> 22 for batch in train_data:
[StraightDope: Python2 Single-GPU] 23 ############################
[StraightDope: Python2 Single-GPU] 24 # (1) Update D network: maximize log(D(x)) + log(1 - D(G(z)))
[StraightDope: Python2 Single-GPU]
[StraightDope: Python2 Single-GPU] /work/mxnet/python/mxnet/io/io.pyc in next(self)
[StraightDope: Python2 Single-GPU] 678 raise StopIteration
[StraightDope: Python2 Single-GPU] 679 data = self.getdata()
[StraightDope: Python2 Single-GPU] --> 680 label = self.getlabel()
[StraightDope: Python2 Single-GPU] 681 # iter should stop when last batch is not complete
[StraightDope: Python2 Single-GPU] 682 if data[0].shape[0] != self.batch_size:
[StraightDope: Python2 Single-GPU]
[StraightDope: Python2 Single-GPU] /work/mxnet/python/mxnet/io/io.pyc in getlabel(self)
[StraightDope: Python2 Single-GPU] 748 def getlabel(self):
[StraightDope: Python2 Single-GPU] 749 """Get label."""
[StraightDope: Python2 Single-GPU] --> 750 return self._batchify(self.label)
[StraightDope: Python2 Single-GPU] 751
[StraightDope: Python2 Single-GPU] 752 def getpad(self):
[StraightDope: Python2 Single-GPU]
[StraightDope: Python2 Single-GPU] /work/mxnet/python/mxnet/io/io.pyc in _batchify(self, data_source)
[StraightDope: Python2 Single-GPU] 730 self.cursor + self.batch_size > self.num_data:
[StraightDope: Python2 Single-GPU] 731 pad = self.batch_size - self.num_data + self.cursor
[StraightDope: Python2 Single-GPU] --> 732 first_data = self._getdata(data_source, start=self.cursor)
[StraightDope: Python2 Single-GPU] 733 second_data = self._getdata(data_source, end=pad)
[StraightDope: Python2 Single-GPU] 734 return self._concat(first_data, second_data)
[StraightDope: Python2 Single-GPU]
[StraightDope: Python2 Single-GPU] /work/mxnet/python/mxnet/io/io.pyc in _getdata(self, data_source, start, end)
[StraightDope: Python2 Single-GPU] 692 assert start is not None or end is not None, 'should at least specify start or end'
[StraightDope: Python2 Single-GPU] 693 start = start if start is not None else 0
[StraightDope: Python2 Single-GPU] --> 694 end = end if end is not None else data_source[0][1].shape[0]
[StraightDope: Python2 Single-GPU] 695 s = slice(start, end)
[StraightDope: Python2 Single-GPU] 696 return [
[StraightDope: Python2 Single-GPU]
[StraightDope: Python2 Single-GPU] IndexError: list index out of range
[StraightDope: Python2 Single-GPU] IndexError: list index out of range
http://jenkins.mxnet-ci.amazon-ml.com/job/NightlyTests_onBinaries/148/console
The two notebooks are from The Straight Dope book that both repro the out of bounds error:
chapter14_generative-adversarial-networks/dcgan and
chapter14_generative-adversarial-networks/pixel2pixel available at
https://github.com/zackchase/mxnet-the-straight-dope
Vishaal
Thanks Vishaal.
On a side note, how did we miss nightly master build failure. We need to revisit it once.
Thanks @vishaalkapoor.
work on it
Thanks for submitting the issue @iamthebot
@mxnet-label-bot[NDArray, Bug]
@stu1130 Do we still need a repro? Sorry, I haven't gotten around to it.
@iamthebot I am able to find the root cause, so don't worry about it. Thanks
@iamthebot
Could you give me the repro of what the data shape is and how you used and initialized the NDArrayIter?
We would like to make sure all the existing use cases work!
Thank you so much!
The patch is merged, @sandeep-krishnamurthy please close it