if i am running with PennFudanDataset dataset then it's running but if i am changing dataset then i am getting this error .....
Epoch: [0] [ 0/155] eta: 0:01:05 lr: 0.000037 loss: 3.9229 (3.9229) loss_classifier: 0.9148 (0.9148) loss_box_reg: 0.1397 (0.1397) loss_mask: 2.8494 (2.8494) loss_objectness: 0.0084 (0.0084) loss_rpn_box_reg: 0.0107 (0.0107) time: 0.4194 data: 0.0865 max mem: 2032
Loss is nan, stopping training
{'loss_classifier': tensor(0.9080, device='cuda:1', grad_fn=
An exception has occurred, use %tb to see the full traceback.
SystemExit: 1
can anyone help??
Hi @vivekdeepquanty
can anyone help??
Just from the console output I would say it is impossible to say what is going wrong.
if i am running with PennFudanDataset dataset then it's running but if i am changing dataset then i am getting this error .....
That suggests that there is something wrong with the dataset you switched to. Have you verified that the dataset is behaving like you want it to?
Next steps:
torch or torchvision open an issue with bug report template and follow the steps listed there.Hi @vivekdeepquanty
can anyone help??
Just from the console output I would say it is impossible to say what is going wrong.
if i am running with PennFudanDataset dataset then it's running but if i am changing dataset then i am getting this error .....
That suggests that there is something wrong with the dataset you switched to. Have you verified that the dataset is behaving like you want it to?
Next steps:
1. Strip down every complexity in your code that is not needed to reproduce the error. 2. If you think that the behavior is a bug in `torch` or `torchvision` open an issue with bug report template and follow the steps listed there. 3. If you don't think this is a bug, please post your minimal example with an accompanying question in our [discussion forum](https://discuss.pytorch.org/), which is our primary means of support.
I had not modified any thing in code even no of class is also 2 only change is i am using my custom dataset.
When i am using PennFudanDataset code is working fine.
I had not modified any thing in code even no of class is also 2
What code are you using?
only change is i am using my custom dataset.
When i am using PennFudanDataset code is working fine.
This implies that your custom dataset is not working as intended. Please check this first.
@vivekdeepquanty I think you might have invalid boxes in your dataset (for example, boxes with negative size).
See my comment in https://github.com/pytorch/vision/issues/997#issuecomment-499429297
As I believe this is the same issue, I'm closing this one, but let us know if this isn't the case.
Is txt file is required for training??
Because i am using only mask and img folder.
Proble-Loss in nan
I've never worked with detection datasets, but from what I understand this following snippet should run without an AssertionError.
dataset = PennFudanDataset("/path/to/PennFudan/root", transforms=None)
for img, target in dataset:
width, height = img.size
for box in target["boxes"]:
xmin, ymin, xmax, ymax = box.tolist()
assert xmin >= 0
assert xmax <= width
assert xmin <= xmax
assert ymin >= 0
assert ymax <= height
assert ymin <= ymax
Try this with your custom dataset and get back if this runs through and you still get NaN loss.
I've never worked with detection datasets, but from what I understand this following snippet should run without an
AssertionError.dataset = PennFudanDataset("/path/to/PennFudan/root", transforms=None) for img, target in dataset: width, height = img.size for box in target["boxes"]: xmin, ymin, xmax, ymax = box.tolist() assert xmin >= 0 assert xmax <= width assert xmin <= xmax assert ymin >= 0 assert ymax <= height assert ymin <= ymaxTry this with your custom dataset and get back if this runs through and you still get
NaNloss.
still i am getting same error
At this point there is nothing we can do without seeing your code. Please strip down every complexity in your code that is not needed to reproduce the error and post your code here afterwards.
Most helpful comment
I've never worked with detection datasets, but from what I understand this following snippet should run without an
AssertionError.Try this with your custom dataset and get back if this runs through and you still get
NaNloss.