Pytorch: problem with torch.util.tensorboard add_graph()

Created on 11 Aug 2019 · 61Comments · Source: pytorch/pytorch

code

import torch
import torch.nn as nn
from torch.utils.tensorboard import SummaryWriter

class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()
        self.conv1 = nn.Sequential(     #input_size=(1*28*28)
            nn.Conv2d(1, 6, 5, 1, 2),
            nn.ReLU(),      #(6*28*28)
            nn.MaxPool2d(kernel_size=2, stride=2),  #output_size=(6*14*14)
        )
        self.conv2 = nn.Sequential(
            nn.Conv2d(6, 16, 5),
            nn.ReLU(),      #(16*10*10)
            nn.MaxPool2d(2, 2)  #output_size=(16*5*5)
        )
        self.fc1 = nn.Sequential(
            nn.Linear(16 * 5 * 5, 120),
            nn.ReLU()
        )
        self.fc2 = nn.Sequential(
            nn.Linear(120, 84),
            nn.ReLU()
        )
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = x.view(x.size()[0], -1)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        return x

dummy_input = torch.rand(13, 1, 28, 28)
model = LeNet()
with SummaryWriter(comment='Net', log_dir='/output') as w:
    w.add_graph(model, (dummy_input, ))

🐛 Bug

log_dir is right
But tensorboard shows nothing ！！
Does anyone encounter the same problem？

To Reproduce

Expected behavior

Environment

PyTorch Version (e.g., 1.0): 1.2.0
OS (e.g., Linux): ubuntu16.04
How you installed PyTorch (conda, pip, source): pip
Build command you used (if compiling from source): pip install torch -U
Python version: 3.6
CUDA/cuDNN version: CUDA 10.0.130
GPU models and configuration: GTX1080Ti
Any other relevant information:

Additional context

tensorboard triaged

Source

GinSoda

Most helpful comment

I too am getting a graph page that is empty. I did flush and close the SummaryWriter.
Attaching screenshot including Chrome's console which shows an error that may be related.

Note:

I do see the textual graph being dumped to the command line console and it seems correct there.

Configuration:

PyTorch 1.2.0
TensoBoard 1.14.0
Python 3.5.2

rfejgin on 27 Aug 2019

👍14

All 61 comments

You need to close the writer or flush it.

    w.flush()
    w.close()

I faced the same problem and had it posted in StackOverflow
That will generate the log, but in my case it is still unable to load it in tensorboard, giving a

Unhandled Promise Rejection: TypeError: null is not an object (evaluating 'Fa.node')

Error in the console of the browser when loading the graph. I have tried graphs generated in tensorflow and they worked, it is only with pytorch ones, even the one provided in the tutorial for tensorboard in pytorch ( the one using torchvision). The log file does contain the graph, as I see it in its contents, and the script doesn't complain when saving it, it is just at visualising time at tensorboard. Let me know if you are able to visualise your graph.

I can confirm that the issue with the Unhandled Promise Rejection does not happen in 1.1, I downgraded it to 1.1 and it worked, the graph is now showing on Tensorboard. Weirdly, the graph generated by 1.1 has only 124 elements, while the one by 1.2, there are 507. This is shown when verbose is True, and I am attaching the output in txt files generated by both.
verbose_graph_1.1.txt
verbose_graph_1.2.txt

LittlePea13 on 11 Aug 2019

👎1

Still blank after refreshing

bintonto on 12 Aug 2019

import torch
import torchvision.models as models
from torch.utils.tensorboard import SummaryWriter
resnet18 = models.resnet18(pretrained=True)
x = torch.randn(1, 3, 224, 224)
writer = SummaryWriter()
writer.add_graph(resnet18, x)
writer.close()

Collecting environment information...
PyTorch version: 1.2.0
Is debug build: No
CUDA used to build PyTorch: None

OS: Mac OSX 10.14.6
GCC version: Could not collect
CMake version: version 3.13.4

Python version: 3.7
Is CUDA available: No
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA

Versions of relevant libraries:
[pip3] numpy==1.15.4
[conda] blas 1.0 mkl
[conda] mkl 2019.4 233
[conda] mkl_fft 1.0.12 py37h5e564d8_0
[conda] mkl_random 1.0.2 py37h27c97d8_0
[conda] pytorch 1.2.0 py3.7_0 pytorch
[conda] pytorch-nightly 1.2.0.dev20190629 py3.7_0 pytorch
[conda] pytorch-transformers 1.0.0 pypi_0 pypi
[conda] torchaudio 0.3.0 py37 pytorch
[conda] torchsummary 1.5.1 pypi_0 pypi
[conda] torchvision 0.4.0 py37_cpu pytorch

bintonto on 12 Aug 2019

If you check your log file, you will see it contains the graph, it is a different error than the first one mentioned by Ginsoda.

Do you get a graph page in Tensorboard but the graph doesn't load? If you check your browser console does it say

Unhandled Promise Rejection: TypeError: null is not an object (evaluating 'Fa.node')

Because in that case it is the same problem I am facing. I had to downgrade to v1.1, but the graph there is way more simpler and doesn't contain all the model.

LittlePea13 on 12 Aug 2019

Hi all,
@LittlePea13 in my case I do have the exact same problem as you (the tab show up, but the model does not load and I have the same browser error) ! Haven't downgraded to v1.1 yet though.

maximiliense on 13 Aug 2019

I too am getting a graph page that is empty. I did flush and close the SummaryWriter.
Attaching screenshot including Chrome's console which shows an error that may be related.

Note:

I do see the textual graph being dumped to the command line console and it seems correct there.

Configuration:

PyTorch 1.2.0
TensoBoard 1.14.0
Python 3.5.2

rfejgin on 27 Aug 2019

👍14

Note that I too get the same error when copying the example given in the PyTorch documentation:
https://pytorch.org/docs/stable/tensorboard.html

Only difference is that I am not using TensorBoard nightly, but the released TensorBoard 1.14.0.

rfejgin on 27 Aug 2019

I'm having the same issue - any luck on tracing the issue?

StuvX on 27 Aug 2019

me too.
my enviroment information:
windows 10
python3.6
pytorch 1.2 Is CUDA available: No
tensorboard 1.14 or tb-nightly 1.14.0a20190614 or tb-nightly 1.15.0a20190826
tensorflow 1.14
tensorboardX 1.8
numpy 1.17.0
blank

alqbib on 27 Aug 2019

Getting the same problem as @alqbib and @rfejgin , im running the tutorial code located at https://pytorch.org/tutorials/intermediate/tensorboard_tutorial.html

ErenBalatkan on 28 Aug 2019

Related issues from the forum:

ptrblck on 28 Aug 2019

👍1

@apaszke @orionr, @lanpa - any idea? Thanks!

rfejgin on 28 Aug 2019

My hint is that this is due to the TensorBoard compat (non-TensorFlow case) issue we saw where the log directory doesn't update correctly. Fixed in https://github.com/tensorflow/tensorboard/pull/2342. Unfortunately this didn't make it out for TensorBoard 1.14, so you have three options - (1) use TensorBoard nightly with the fix, (2) install TensorFlow to leverage that code path in TensorBoard or (3) restart TensorBoard periodically for it to pickup the changes.

Please let us know if one of those options takes care of it.

orionr on 28 Aug 2019

Same issue with TB nightly. I don't think it's (3) because this happens even when I restart TB after the graph dump is complete. Will try (2).

rfejgin on 28 Aug 2019

👍2

Same issue with TensorFlow 1.14.0

rfejgin on 28 Aug 2019

Possibly related - see screenshot of error in the Chrome console above
https://github.com/pytorch/pytorch/issues/24157#issuecomment-525055887

rfejgin on 28 Aug 2019

Interesting - you're right that Chrome console output is unusual. I wonder if our graph proto is somehow wrong in this case. @lanpa can you confirm the tutorial code works for you? Thanks.

orionr on 28 Aug 2019

👍1

@lanpa: I wasn't sure what you meant by the thumbs-up - does the tutorial code work for you?
I've seen this problem (graph not displayed) both with the tutorial code and my own models. Given that others have observed the same, something seems broken in the graph functionality...

rfejgin on 30 Aug 2019

cc @sanekmelnikov @natalialunova

orionr on 30 Aug 2019

Can confirm the same issue with tutorial code as well as custom model.
Verbose output looks fine, graph does not load with console error in chromium:

(index):24242 Uncaught (in promise) TypeError: Cannot read property 'node' of null
    at (index):24242
    at arrayEach ((index):13920)
    at Function.forEach ((index):14082)
    at B.buildSubhierarchy ((index):24242)
    at new B ((index):24229)
    at HTMLElement.<anonymous> ((index):25062)
    at Object.d.time ((index):24285)
    at HTMLElement._buildRenderHierarchy ((index):25061)
    at HTMLElement._buildNewRenderHierarchy ((index):25061)
    at Object.runMethodEffect [as fn] ((index):3714)

tb-nightly (1.15.0a20190902)
pytorch (newest torch package via pip)

richard-vock on 3 Sep 2019

Stuck by the same issue, any news?

JianhuanZhuo on 6 Sep 2019

It seems like some graphs cause this issue. A potential fix is at https://github.com/pytorch/pytorch/pull/25599/ but we're still confirming. If you're willing to apply those changes locally and confirm it fixes your issue that would be great.

orionr on 6 Sep 2019

👍1

Hi @orionr, I can confirm that tensorboard does show the graph now! thanks!

maximiliense on 6 Sep 2019

👍1

Hi @orionr, I can confirm that tensorboard does show the graph now! thanks!

Me too! thanks!

ulisesbussi on 6 Sep 2019

❤1 👍1

In that case, landing the changes so they'll be in pytorch-nightly. We'll then add more robust testing around these cases. Thank you!

orionr on 6 Sep 2019

👍1

Works here too, thanks for the fix.

rfejgin on 6 Sep 2019

👍1

Fix landed. Please confirm fixed in pytorch-nightly after the build tonight, but closing.

orionr on 6 Sep 2019

👍2

Works on the last pytorch-nightly. Thanks!

LittlePea13 on 7 Sep 2019

@orionr Not working for me, I still have the empty rectangles after updating to the nightly versions. I have the following message in the web console (when trying to load the graph), same as https://github.com/pytorch/pytorch/issues/24157#issuecomment-525055887

:formatted:85668 Uncaught (in promise) TypeError: Cannot read property 'node' of null
    at :formatted:85668
    at arrayEach (:formatted:22625)
    at Function.forEach (:formatted:25823)
    at B.buildSubhierarchy (:formatted:85666)
    at new B (:formatted:85398)
    at HTMLElement.<anonymous> (:formatted:87876)
    at Object.d.time (:formatted:86482)
    at HTMLElement._buildRenderHierarchy (:formatted:87866)
    at HTMLElement._buildNewRenderHierarchy (:formatted:87858)
    at Object.runMethodEffect [as fn] (:formatted:11682)

tensorboard: 1.15.0 nightly
torch: 1.3.0 nightly
python: 3.7

clefourrier on 13 Sep 2019

I am going to give pytorch-nightly a try today - my issue was empty graph picture, with two empty rectangles.

nlhnt on 20 Sep 2019

Me too, a blank page with two empty rectangles!!
torch:1.2.0
tensorboard:2.0.0
python:3.5

zhyj3038 on 25 Sep 2019

Same, blank page with two empty rectangles. regardless of using either pip or andaconda packages.
torch 1.2.0
tensorboard: 1.14.0
python: 3.7
ubuntu 16.04

dnovischi on 25 Sep 2019

@zhyj3038 and @dnovischi did you try pytorch-nightly? Please confirm that fixes your issue. If so, this will be fixed in the 1.3.0 release.

orionr on 25 Sep 2019

@zhyj3038 and @dnovischi did you try pytorch-nightly? Please confirm that fixes your issue. If so, this will be fixed in the 1.3.0 release.

It's OK now.
torch:1.3.0 nightly
tensorboard:2.0.0
python:3.5
ubuntu:16.04

zhyj3038 on 26 Sep 2019

👍1

Sorry, torch:1.3.0 nightly did not fix the issue on python 3.7. I will down grade to python 3.5 latter this week, to see if its still a problem.

dnovischi on 26 Sep 2019

@dnovischi, thanks for letting us know 1.3 doesn't work. Can you post a piece of sample code that shows the issue? cc @lanpa @sanekmelnikov

orionr on 26 Sep 2019

👍2

@orionr Here you go and thanks for the quick response.

example-torch-nightly.zip

Update:
Installing the future package solved the issue for the following setup:
torch 1.3.0.dev2019091
tensorboard 1.14.0
python 3.6
ubuntu 16.04

However, I now get a warning when launching the tensorboard server:
"FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; ..."
Of course, this is a tensorflow issue.

Also note that in the sample code, above, i forgot to close the summary-writer, tb.close()

dnovischi on 26 Sep 2019

👍1

Was having the same problem.

I think it's necessary to have tensorboard-2.0.0. I wasn't able to get it to work with tensorboard-1.14 and pytorch nightly build.

Edit: Does now work with Python 3.6, tensorboard-2.0.0, pytorch-1.3.0dev20190925, Mac OS 10.14.6.

akashb95 on 28 Sep 2019

👍2

@orionr Thank you for your guidance，I've just solved this problem.
version：
torch 1.3.0.dev20191002
tensorboard 1.14.0
Python 3.7

wy171205 on 3 Oct 2019

👍3

Also working with

pytorch 1.3.0.dev20190917
tensorboard from tf 2.0.0

willprice on 3 Oct 2019

👍1

I updated to

torch nightly 1.3.0.dev20191003
still using tensorflow 1.14.0
python 3.7.2

and it's still not working for me. (With the same web console log).

During the graph creation, I get the following trace

.../MEDeA/medea/models/transformer/cells.py:17: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  mask[i, :tensor.size(0)] = 1
.../MEDeA/medea/models/transformer/cells.py:131: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  x = x + self.pe[:x.size(0), :, :x.size(-1)]
.../MEDeA/medea/models/transformer/cells.py:64: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert e == self.input_dim, f'Input dim ({e}) should match layer input dim ({self.input_dim})'
.../MEDeA/medea/models/transformer/cells.py:83: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  scores = torch.matmul(query, key.transpose(-1, -2)) / math.sqrt(key_dim)  # matrix multi and scale
.../MEDeA/medea/models/transformer/decoder.py:79: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  decoder_outputs = [torch.tensor(first_item).float().view(self.batch_size, -1)]  # first item is not predicted
.../MEDeA/medea/models/transformer/decoder.py:108: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  prev_predictions = torch.tensor([target_lang_token] * self.batch_size).long().view(self.batch_size, -1)
.../MEDeA/medea/models/transformer/decoder.py:109: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  eow = torch.tensor([eow_token] * self.batch_size).long().view(self.batch_size)
.../MEDeA/medea/models/transformer/decoder.py:111: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  while not torch.all(torch.eq(prev_predictions[:, -1], eow)) and i < memory.shape[1]:
.../MEDeA/medea/models/transformer/cells.py:85: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  indices = torch.triu_indices(key_dim, key_dim, offset=1)
.../MEDeA/medea/models/transformer/cells.py:17: TracerWarning: There are 2 live references to the data region being modified when tracing in-place operator copy_ (possibly due to an assignment). This might cause the trace to be incorrect, because all other views that also reference this data will not reflect this change in the trace! On the other hand, if all other views use the same memory chunk, but are disjoint (e.g. are outputs of torch.split), this might still be safe.
  mask[i, :tensor.size(0)] = 1
.../MEDeA/medea/models/transformer/cells.py:86: TracerWarning: There are 2 live references to the data region being modified when tracing in-place operator index_put_. This might cause the trace to be incorrect, because all other views that also reference this data will not reflect this change in the trace! On the other hand, if all other views use the same memory chunk, but are disjoint (e.g. are outputs of torch.split), this might still be safe.
  scores[:, :, indices[0], indices[1]] = -1e-32
.../builds/onnx-tensorflow/onnx_tf/common/handler_helper.py:37: UserWarning: Unknown op ConstantFill in domain `ai.onnx`.
  handler.ONNX_OP, handler.DOMAIN or "ai.onnx"))
.../builds/onnx-tensorflow/onnx_tf/common/handler_helper.py:37: UserWarning: Unknown op ImageScaler in domain `ai.onnx`.
  handler.ONNX_OP, handler.DOMAIN or "ai.onnx"))
.../builds/onnx-tensorflow/onnx_tf/common/handler_helper.py:34: UserWarning: Fail to get since_version of IsInf in domain `` with max_inclusive_version=9. Set to 1.
  handler.ONNX_OP, handler.DOMAIN, version))
.../builds/onnx-tensorflow/onnx_tf/common/handler_helper.py:34: UserWarning: Fail to get since_version of Mod in domain `` with max_inclusive_version=9. Set to 1.
  handler.ONNX_OP, handler.DOMAIN, version))
.../builds/onnx-tensorflow/onnx_tf/common/handler_helper.py:37: UserWarning: Unknown op Range in domain `ai.onnx`.
  handler.ONNX_OP, handler.DOMAIN or "ai.onnx"))
.../builds/onnx-tensorflow/onnx_tf/common/handler_helper.py:34: UserWarning: Fail to get since_version of Resize in domain `` with max_inclusive_version=9. Set to 1.
  handler.ONNX_OP, handler.DOMAIN, version))
.../builds/onnx-tensorflow/onnx_tf/common/handler_helper.py:34: UserWarning: Fail to get since_version of ReverseSequence in domain `` with max_inclusive_version=9. Set to 1.
  handler.ONNX_OP, handler.DOMAIN, version))
.../builds/onnx-tensorflow/onnx_tf/common/handler_helper.py:37: UserWarning: Unknown op Round in domain `ai.onnx`.
  handler.ONNX_OP, handler.DOMAIN or "ai.onnx"))
.../builds/onnx-tensorflow/onnx_tf/common/handler_helper.py:34: UserWarning: Fail to get since_version of ThresholdedRelu in domain `` with max_inclusive_version=9. Set to 1.
  handler.ONNX_OP, handler.DOMAIN, version))
W1003 20:30:03.961869 4491834816 deprecation.py:323] From .../builds/onnx-tensorflow/onnx_tf/handlers/backend/reshape.py:26: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W1003 20:30:03.964736 4491834816 deprecation.py:323] From .../builds/onnx-tensorflow/onnx_tf/handlers/backend/reshape.py:31: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
W1003 20:30:04.010443 4491834816 deprecation.py:323] From .../builds/onnx-tensorflow/onnx_tf/handlers/backend_handler.py:182: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.

clefourrier on 3 Oct 2019

@clefourrier can you try installing TensorBoard (not necessarily TensorFlow) v2.0 and see if that fixes things for you?

orionr on 4 Oct 2019

As mentioned by others, I think we still need py 3.6. This worked for me:

python 3.6.9
tensorboard 2.0.0
torch 1.3.0.dev20191003

py 3.5 works. py 3.7 doesn't work.
Didn't try tensorboard 1.14.0

ysono on 4 Oct 2019

👍1

@sanekmelnikov and @lanpa can we try py 3.7? Thanks.

orionr on 4 Oct 2019

@clefourrier can you try installing TensorBoard (not necessarily TensorFlow) v2.0 and see if that fixes things for you?

@orionr I should have mentioned that I'm running tensorboard nightly, sorry (tb-nightly - 2.0.0a20190915 )

clefourrier on 4 Oct 2019

Just tried with py3.7 and tb 2.0 locally with Mac and the example on https://pytorch.org/docs/stable/tensorboard.html worked for me. @clefourrier and @ysono can you try and isolate your respective errors? Maybe try the simple ResNet example above to see if that works for you. At this point it's unlikely we can get any fix in for the PyTorch 1.3 release coming soon, but happy to fix anything in the nightly once we've isolated things.

orionr on 4 Oct 2019

I used 3.7 too and didn't have issues. Perhaps there is a specific op that is causing the issue?

willprice on 4 Oct 2019

For me, Tensorboard is not the problem but PyTorch IS.

Tested with Pytorch 1.3.0.dev20190917 and it renders the graph in horizontal mode.
Screenshot_2019-10-04_23-32-16

Pytorch 1.1 renders the same architecture as,
Screenshot_2019-10-04_23-36-38

The older version is what we expect(?) and is easier to read. With 1.13, it was impossible to read ResNet graph.

I get the same rendering for both summary files with two different Tensorboard versions

1.14.0
2.1.0a2019100

SuperShinyEyes on 4 Oct 2019

👀1

Thanks for the details. @lanpa, @J0Nreynolds and @sanekmelnikov are looking to improve this visualization with https://github.com/pytorch/pytorch/pull/26639 in 1.4

orionr on 1 Nov 2019

@orionr Thank you for your guidance，I've just solved this problem.
on Windows
version：
torch 1.3.1
tensorboard 2.0.1
Python 3.7.4

shayan113 on 20 Nov 2019

@shayan113 Which problem have you solved?
Relating the visualization problem with Resnet, I still get quite hard readable plots:

Ubuntu with
Torch: 1.3.1
Tensorboard: 2.0.2
Python: 3.7.4

jonas154 on 2 Dec 2019

👀2

Got it working. I had to install tensorboardX and import SummaryWriter from there. Also, I installed everything via conda.
allVersions
graph

shschong on 18 Dec 2019

👍2

Too bad that you needed to use tensorboardX instead of torch.utils.tensorboard, but happy you were able to unblock. @jonas154 did you try with 1.4? Thanks.

orionr on 18 Dec 2019

@shschong Thanks for sharing your solution!

@orionr So far 1.4 isn't released or? I wanted to wait till the release of the latest version. So far I'm using the Hiddenlayer tool https://github.com/waleedka/hiddenlayer

jonas154 on 19 Dec 2019

Thank you all, I think everything is settled with PyTorch including visualization being top to down instead of left to right. This is my specs

Ubuntu 18.04.3 LTS
torch 1.5.0.dev20200113
tensorboard 2.0.1
Python 3.7

AceEviliano on 14 Jan 2020

Update to pytorch 1.4 and tensorboard 2.1.0 with python 3.6, works well.

Make TensorBoard Work with PyTorch

BrambleXu on 18 Jan 2020

👍2

I am using
torch 1.4.0
tensor board 2.0.2
Python 3.7.4
TensorBoard still display two rectangle for graph.

Hsulet on 23 Jan 2020

👍2

Hi @orionr @ptrblck
I just encountered another issue with add_graph.

My packages:

pytorch==1.4.0
ypthon==3.7.6

When I use

...
train_data_sample, _, _ = iter(dataloader_train).next()
writer.add_graph(model,train_data_sample) 
...

The error occurs:

*** RuntimeError: Cannot insert a Tensor that requires grad as a constant. Consider making it a parameter or input, or detaching the gradient

And the traceback clearly shows the error happens when called add_graph. Also, I tried to use change input_to_model to FloatTensor or LongTensor, they all won't work.

Just confirmed, it has something to do with DataParallel. If I call add_graph before move model to my all 4 GPUs, there's no such bug. So, is there any way to avoid such issue without not using DataParallel?

sdsy888 on 31 Jan 2020

👀1

Hi @orionr @ptrblck
I just encountered another issue with add_graph.

My packages:
pytorch==1.4.0
ypthon==3.7.6
When I use
...
train_data_sample, _, _ = iter(dataloader_train).next()
writer.add_graph(model,train_data_sample) 
...
The error occurs:
*** RuntimeError: Cannot insert a Tensor that requires grad as a constant. Consider making it a parameter or input, or detaching the gradient
And the traceback clearly shows the error happens when called add_graph. Also, I tried to use change input_to_model to FloatTensor or LongTensor, they all won't work.

Just confirmed, it has something to do with DataParallel. If I call add_graph before move model to my all 4 GPUs, there's no such bug. So, is there any way to avoid such issue without not using DataParallel?

[Problem solved]

I found it's the DataParallel when using multi GPUs that cause the problem. We need to fetch the model before wrapping it in the DataParallel.

So here's the method for those who encounter the same issue:

 # setup the summary writer
train_data_sample, label_sample = iter(dataloader_train).next()
writer = SummaryWriter(args.summary_path, flush_secs=120)

with writer:
    writer.add_graph(model.module,train_data_sample.to(device))  # model graph, with input

sdsy888 on 31 Jan 2020

👍1