import torch
import torch.nn as nn
from torch.utils.tensorboard import SummaryWriter
class LeNet(nn.Module):
def __init__(self):
super(LeNet, self).__init__()
self.conv1 = nn.Sequential( #input_size=(1*28*28)
nn.Conv2d(1, 6, 5, 1, 2),
nn.ReLU(), #(6*28*28)
nn.MaxPool2d(kernel_size=2, stride=2), #output_size=(6*14*14)
)
self.conv2 = nn.Sequential(
nn.Conv2d(6, 16, 5),
nn.ReLU(), #(16*10*10)
nn.MaxPool2d(2, 2) #output_size=(16*5*5)
)
self.fc1 = nn.Sequential(
nn.Linear(16 * 5 * 5, 120),
nn.ReLU()
)
self.fc2 = nn.Sequential(
nn.Linear(120, 84),
nn.ReLU()
)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.conv1(x)
x = self.conv2(x)
x = x.view(x.size()[0], -1)
x = self.fc1(x)
x = self.fc2(x)
x = self.fc3(x)
return x
dummy_input = torch.rand(13, 1, 28, 28)
model = LeNet()
with SummaryWriter(comment='Net', log_dir='/output') as w:
w.add_graph(model, (dummy_input, ))
log_dir is right
But tensorboard shows nothing !!
Does anyone encounter the same problem?
conda
, pip
, source): pipYou need to close the writer or flush it.
w.flush()
w.close()
I faced the same problem and had it posted in StackOverflow
That will generate the log, but in my case it is still unable to load it in tensorboard, giving a
Unhandled Promise Rejection: TypeError: null is not an object (evaluating 'Fa.node')
Error in the console of the browser when loading the graph. I have tried graphs generated in tensorflow and they worked, it is only with pytorch ones, even the one provided in the tutorial for tensorboard in pytorch ( the one using torchvision). The log file does contain the graph, as I see it in its contents, and the script doesn't complain when saving it, it is just at visualising time at tensorboard. Let me know if you are able to visualise your graph.
I can confirm that the issue with the Unhandled Promise Rejection does not happen in 1.1, I downgraded it to 1.1 and it worked, the graph is now showing on Tensorboard. Weirdly, the graph generated by 1.1 has only 124 elements, while the one by 1.2, there are 507. This is shown when verbose is True, and I am attaching the output in txt files generated by both.
verbose_graph_1.1.txt
verbose_graph_1.2.txt
Still blank after refreshing
import torch
import torchvision.models as models
from torch.utils.tensorboard import SummaryWriter
resnet18 = models.resnet18(pretrained=True)
x = torch.randn(1, 3, 224, 224)
writer = SummaryWriter()
writer.add_graph(resnet18, x)
writer.close()
Collecting environment information...
PyTorch version: 1.2.0
Is debug build: No
CUDA used to build PyTorch: None
OS: Mac OSX 10.14.6
GCC version: Could not collect
CMake version: version 3.13.4
Python version: 3.7
Is CUDA available: No
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Versions of relevant libraries:
[pip3] numpy==1.15.4
[conda] blas 1.0 mkl
[conda] mkl 2019.4 233
[conda] mkl_fft 1.0.12 py37h5e564d8_0
[conda] mkl_random 1.0.2 py37h27c97d8_0
[conda] pytorch 1.2.0 py3.7_0 pytorch
[conda] pytorch-nightly 1.2.0.dev20190629 py3.7_0 pytorch
[conda] pytorch-transformers 1.0.0 pypi_0 pypi
[conda] torchaudio 0.3.0 py37 pytorch
[conda] torchsummary 1.5.1 pypi_0 pypi
[conda] torchvision 0.4.0 py37_cpu pytorch
If you check your log file, you will see it contains the graph, it is a different error than the first one mentioned by Ginsoda.
Do you get a graph page in Tensorboard but the graph doesn't load? If you check your browser console does it say
Unhandled Promise Rejection: TypeError: null is not an object (evaluating 'Fa.node')
Because in that case it is the same problem I am facing. I had to downgrade to v1.1, but the graph there is way more simpler and doesn't contain all the model.
Hi all,
@LittlePea13 in my case I do have the exact same problem as you (the tab show up, but the model does not load and I have the same browser error) ! Haven't downgraded to v1.1 yet though.
I too am getting a graph page that is empty. I did flush and close the SummaryWriter.
Attaching screenshot including Chrome's console which shows an error that may be related.
Note:
Configuration:
Note that I too get the same error when copying the example given in the PyTorch documentation:
https://pytorch.org/docs/stable/tensorboard.html
Only difference is that I am not using TensorBoard nightly, but the released TensorBoard 1.14.0.
I'm having the same issue - any luck on tracing the issue?
me too.
my enviroment information:
windows 10
python3.6
pytorch 1.2 Is CUDA available: No
tensorboard 1.14 or tb-nightly 1.14.0a20190614 or tb-nightly 1.15.0a20190826
tensorflow 1.14
tensorboardX 1.8
numpy 1.17.0
Getting the same problem as @alqbib and @rfejgin , im running the tutorial code located at https://pytorch.org/tutorials/intermediate/tensorboard_tutorial.html
Related issues from the forum:
@apaszke @orionr, @lanpa - any idea? Thanks!
My hint is that this is due to the TensorBoard compat (non-TensorFlow case) issue we saw where the log directory doesn't update correctly. Fixed in https://github.com/tensorflow/tensorboard/pull/2342. Unfortunately this didn't make it out for TensorBoard 1.14, so you have three options - (1) use TensorBoard nightly with the fix, (2) install TensorFlow to leverage that code path in TensorBoard or (3) restart TensorBoard periodically for it to pickup the changes.
Please let us know if one of those options takes care of it.
Same issue with TB nightly. I don't think it's (3) because this happens even when I restart TB after the graph dump is complete. Will try (2).
Same issue with TensorFlow 1.14.0
Possibly related - see screenshot of error in the Chrome console above
https://github.com/pytorch/pytorch/issues/24157#issuecomment-525055887
Interesting - you're right that Chrome console output is unusual. I wonder if our graph proto is somehow wrong in this case. @lanpa can you confirm the tutorial code works for you? Thanks.
@lanpa: I wasn't sure what you meant by the thumbs-up - does the tutorial code work for you?
I've seen this problem (graph not displayed) both with the tutorial code and my own models. Given that others have observed the same, something seems broken in the graph functionality...
cc @sanekmelnikov @natalialunova
Can confirm the same issue with tutorial code as well as custom model.
Verbose output looks fine, graph does not load with console error in chromium:
(index):24242 Uncaught (in promise) TypeError: Cannot read property 'node' of null
at (index):24242
at arrayEach ((index):13920)
at Function.forEach ((index):14082)
at B.buildSubhierarchy ((index):24242)
at new B ((index):24229)
at HTMLElement.<anonymous> ((index):25062)
at Object.d.time ((index):24285)
at HTMLElement._buildRenderHierarchy ((index):25061)
at HTMLElement._buildNewRenderHierarchy ((index):25061)
at Object.runMethodEffect [as fn] ((index):3714)
tb-nightly (1.15.0a20190902)
pytorch (newest torch package via pip)
Stuck by the same issue, any news?
It seems like some graphs cause this issue. A potential fix is at https://github.com/pytorch/pytorch/pull/25599/ but we're still confirming. If you're willing to apply those changes locally and confirm it fixes your issue that would be great.
Hi @orionr, I can confirm that tensorboard does show the graph now! thanks!
Hi @orionr, I can confirm that tensorboard does show the graph now! thanks!
Me too! thanks!
In that case, landing the changes so they'll be in pytorch-nightly
. We'll then add more robust testing around these cases. Thank you!
Works here too, thanks for the fix.
Fix landed. Please confirm fixed in pytorch-nightly after the build tonight, but closing.
Works on the last pytorch-nightly. Thanks!
@orionr Not working for me, I still have the empty rectangles after updating to the nightly versions. I have the following message in the web console (when trying to load the graph), same as https://github.com/pytorch/pytorch/issues/24157#issuecomment-525055887
:formatted:85668 Uncaught (in promise) TypeError: Cannot read property 'node' of null
at :formatted:85668
at arrayEach (:formatted:22625)
at Function.forEach (:formatted:25823)
at B.buildSubhierarchy (:formatted:85666)
at new B (:formatted:85398)
at HTMLElement.<anonymous> (:formatted:87876)
at Object.d.time (:formatted:86482)
at HTMLElement._buildRenderHierarchy (:formatted:87866)
at HTMLElement._buildNewRenderHierarchy (:formatted:87858)
at Object.runMethodEffect [as fn] (:formatted:11682)
I am going to give pytorch-nightly a try today - my issue was empty graph picture, with two empty rectangles.
Me too, a blank page with two empty rectangles!!
torch:1.2.0
tensorboard:2.0.0
python:3.5
Same, blank page with two empty rectangles. regardless of using either pip or andaconda packages.
torch 1.2.0
tensorboard: 1.14.0
python: 3.7
ubuntu 16.04
@zhyj3038 and @dnovischi did you try pytorch-nightly? Please confirm that fixes your issue. If so, this will be fixed in the 1.3.0 release.
@zhyj3038 and @dnovischi did you try pytorch-nightly? Please confirm that fixes your issue. If so, this will be fixed in the 1.3.0 release.
It's OK now.
torch:1.3.0 nightly
tensorboard:2.0.0
python:3.5
ubuntu:16.04
Sorry, torch:1.3.0 nightly did not fix the issue on python 3.7. I will down grade to python 3.5 latter this week, to see if its still a problem.
@dnovischi, thanks for letting us know 1.3 doesn't work. Can you post a piece of sample code that shows the issue? cc @lanpa @sanekmelnikov
@orionr Here you go and thanks for the quick response.
Update:
Installing the future package solved the issue for the following setup:
torch 1.3.0.dev2019091
tensorboard 1.14.0
python 3.6
ubuntu 16.04
However, I now get a warning when launching the tensorboard server:
"FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; ..."
Of course, this is a tensorflow issue.
Also note that in the sample code, above, i forgot to close the summary-writer, tb.close()
Was having the same problem.
I think it's necessary to have tensorboard-2.0.0. I wasn't able to get it to work with tensorboard-1.14 and pytorch nightly build.
Edit: Does now work with Python 3.6, tensorboard-2.0.0, pytorch-1.3.0dev20190925, Mac OS 10.14.6.
@orionr Thank you for your guidance,I've just solved this problem.
version:
torch 1.3.0.dev20191002
tensorboard 1.14.0
Python 3.7
Also working with
1.3.0.dev20190917
2.0.0
I updated to
and it's still not working for me. (With the same web console log).
During the graph creation, I get the following trace
.../MEDeA/medea/models/transformer/cells.py:17: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
mask[i, :tensor.size(0)] = 1
.../MEDeA/medea/models/transformer/cells.py:131: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
x = x + self.pe[:x.size(0), :, :x.size(-1)]
.../MEDeA/medea/models/transformer/cells.py:64: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert e == self.input_dim, f'Input dim ({e}) should match layer input dim ({self.input_dim})'
.../MEDeA/medea/models/transformer/cells.py:83: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
scores = torch.matmul(query, key.transpose(-1, -2)) / math.sqrt(key_dim) # matrix multi and scale
.../MEDeA/medea/models/transformer/decoder.py:79: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
decoder_outputs = [torch.tensor(first_item).float().view(self.batch_size, -1)] # first item is not predicted
.../MEDeA/medea/models/transformer/decoder.py:108: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
prev_predictions = torch.tensor([target_lang_token] * self.batch_size).long().view(self.batch_size, -1)
.../MEDeA/medea/models/transformer/decoder.py:109: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
eow = torch.tensor([eow_token] * self.batch_size).long().view(self.batch_size)
.../MEDeA/medea/models/transformer/decoder.py:111: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
while not torch.all(torch.eq(prev_predictions[:, -1], eow)) and i < memory.shape[1]:
.../MEDeA/medea/models/transformer/cells.py:85: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
indices = torch.triu_indices(key_dim, key_dim, offset=1)
.../MEDeA/medea/models/transformer/cells.py:17: TracerWarning: There are 2 live references to the data region being modified when tracing in-place operator copy_ (possibly due to an assignment). This might cause the trace to be incorrect, because all other views that also reference this data will not reflect this change in the trace! On the other hand, if all other views use the same memory chunk, but are disjoint (e.g. are outputs of torch.split), this might still be safe.
mask[i, :tensor.size(0)] = 1
.../MEDeA/medea/models/transformer/cells.py:86: TracerWarning: There are 2 live references to the data region being modified when tracing in-place operator index_put_. This might cause the trace to be incorrect, because all other views that also reference this data will not reflect this change in the trace! On the other hand, if all other views use the same memory chunk, but are disjoint (e.g. are outputs of torch.split), this might still be safe.
scores[:, :, indices[0], indices[1]] = -1e-32
.../builds/onnx-tensorflow/onnx_tf/common/handler_helper.py:37: UserWarning: Unknown op ConstantFill in domain `ai.onnx`.
handler.ONNX_OP, handler.DOMAIN or "ai.onnx"))
.../builds/onnx-tensorflow/onnx_tf/common/handler_helper.py:37: UserWarning: Unknown op ImageScaler in domain `ai.onnx`.
handler.ONNX_OP, handler.DOMAIN or "ai.onnx"))
.../builds/onnx-tensorflow/onnx_tf/common/handler_helper.py:34: UserWarning: Fail to get since_version of IsInf in domain `` with max_inclusive_version=9. Set to 1.
handler.ONNX_OP, handler.DOMAIN, version))
.../builds/onnx-tensorflow/onnx_tf/common/handler_helper.py:34: UserWarning: Fail to get since_version of Mod in domain `` with max_inclusive_version=9. Set to 1.
handler.ONNX_OP, handler.DOMAIN, version))
.../builds/onnx-tensorflow/onnx_tf/common/handler_helper.py:37: UserWarning: Unknown op Range in domain `ai.onnx`.
handler.ONNX_OP, handler.DOMAIN or "ai.onnx"))
.../builds/onnx-tensorflow/onnx_tf/common/handler_helper.py:34: UserWarning: Fail to get since_version of Resize in domain `` with max_inclusive_version=9. Set to 1.
handler.ONNX_OP, handler.DOMAIN, version))
.../builds/onnx-tensorflow/onnx_tf/common/handler_helper.py:34: UserWarning: Fail to get since_version of ReverseSequence in domain `` with max_inclusive_version=9. Set to 1.
handler.ONNX_OP, handler.DOMAIN, version))
.../builds/onnx-tensorflow/onnx_tf/common/handler_helper.py:37: UserWarning: Unknown op Round in domain `ai.onnx`.
handler.ONNX_OP, handler.DOMAIN or "ai.onnx"))
.../builds/onnx-tensorflow/onnx_tf/common/handler_helper.py:34: UserWarning: Fail to get since_version of ThresholdedRelu in domain `` with max_inclusive_version=9. Set to 1.
handler.ONNX_OP, handler.DOMAIN, version))
W1003 20:30:03.961869 4491834816 deprecation.py:323] From .../builds/onnx-tensorflow/onnx_tf/handlers/backend/reshape.py:26: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W1003 20:30:03.964736 4491834816 deprecation.py:323] From .../builds/onnx-tensorflow/onnx_tf/handlers/backend/reshape.py:31: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
W1003 20:30:04.010443 4491834816 deprecation.py:323] From .../builds/onnx-tensorflow/onnx_tf/handlers/backend_handler.py:182: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
@clefourrier can you try installing TensorBoard (not necessarily TensorFlow) v2.0 and see if that fixes things for you?
As mentioned by others, I think we still need py 3.6. This worked for me:
py 3.5 works. py 3.7 doesn't work.
Didn't try tensorboard 1.14.0
@sanekmelnikov and @lanpa can we try py 3.7? Thanks.
@clefourrier can you try installing TensorBoard (not necessarily TensorFlow) v2.0 and see if that fixes things for you?
@orionr I should have mentioned that I'm running tensorboard nightly, sorry (tb-nightly - 2.0.0a20190915 )
Just tried with py3.7 and tb 2.0 locally with Mac and the example on https://pytorch.org/docs/stable/tensorboard.html worked for me. @clefourrier and @ysono can you try and isolate your respective errors? Maybe try the simple ResNet example above to see if that works for you. At this point it's unlikely we can get any fix in for the PyTorch 1.3 release coming soon, but happy to fix anything in the nightly once we've isolated things.
I used 3.7 too and didn't have issues. Perhaps there is a specific op that is causing the issue?
For me, Tensorboard is not the problem but PyTorch IS.
Tested with Pytorch 1.3.0.dev20190917 and it renders the graph in horizontal mode.
Pytorch 1.1 renders the same architecture as,
The older version is what we expect(?) and is easier to read. With 1.13, it was impossible to read ResNet graph.
I get the same rendering for both summary files with two different Tensorboard versions
Thanks for the details. @lanpa, @J0Nreynolds and @sanekmelnikov are looking to improve this visualization with https://github.com/pytorch/pytorch/pull/26639 in 1.4
@orionr Thank you for your guidance,I've just solved this problem.
on Windows
version:
torch 1.3.1
tensorboard 2.0.1
Python 3.7.4
@shayan113 Which problem have you solved?
Relating the visualization problem with Resnet, I still get quite hard readable plots:
Ubuntu with
Torch: 1.3.1
Tensorboard: 2.0.2
Python: 3.7.4
Got it working. I had to install tensorboardX and import SummaryWriter from there. Also, I installed everything via conda.
Too bad that you needed to use tensorboardX instead of torch.utils.tensorboard, but happy you were able to unblock. @jonas154 did you try with 1.4? Thanks.
@shschong Thanks for sharing your solution!
@orionr So far 1.4 isn't released or? I wanted to wait till the release of the latest version. So far I'm using the Hiddenlayer tool https://github.com/waleedka/hiddenlayer
Thank you all, I think everything is settled with PyTorch including visualization being top to down instead of left to right. This is my specs
Ubuntu 18.04.3 LTS
torch 1.5.0.dev20200113
tensorboard 2.0.1
Python 3.7
Update to pytorch 1.4 and tensorboard 2.1.0 with python 3.6, works well.
I am using
torch 1.4.0
tensor board 2.0.2
Python 3.7.4
TensorBoard still display two rectangle for graph.
Hi @orionr @ptrblck
I just encountered another issue with add_graph
.
My packages:
pytorch==1.4.0
ypthon==3.7.6
When I use
...
train_data_sample, _, _ = iter(dataloader_train).next()
writer.add_graph(model,train_data_sample)
...
The error occurs:
*** RuntimeError: Cannot insert a Tensor that requires grad as a constant. Consider making it a parameter or input, or detaching the gradient
And the traceback clearly shows the error happens when called add_graph
. Also, I tried to use change input_to_model
to FloatTensor
or LongTensor
, they all won't work.
Just confirmed, it has something to do with DataParallel
. If I call add_graph
before move model to my all 4 GPUs, there's no such bug. So, is there any way to avoid such issue without not using DataParallel
?
Hi @orionr @ptrblck
I just encountered another issue withadd_graph
.My packages:
pytorch==1.4.0 ypthon==3.7.6
When I use
... train_data_sample, _, _ = iter(dataloader_train).next() writer.add_graph(model,train_data_sample) ...
The error occurs:
*** RuntimeError: Cannot insert a Tensor that requires grad as a constant. Consider making it a parameter or input, or detaching the gradient
And the traceback clearly shows the error happens when called
add_graph
. Also, I tried to use changeinput_to_model
toFloatTensor
orLongTensor
, they all won't work.Just confirmed, it has something to do with
DataParallel
. If I calladd_graph
before move model to my all 4 GPUs, there's no such bug. So, is there any way to avoid such issue without not usingDataParallel
?
[Problem solved]
I found it's the DataParallel
when using multi GPUs that cause the problem. We need to fetch the model before wrapping it in the DataParallel
.
So here's the method for those who encounter the same issue:
# setup the summary writer
train_data_sample, label_sample = iter(dataloader_train).next()
writer = SummaryWriter(args.summary_path, flush_secs=120)
with writer:
writer.add_graph(model.module,train_data_sample.to(device)) # model graph, with input
Update to pytorch 1.4 and tensorboard 2.1.0 with python 3.6, works well.
I can confirm - after an update to pytorch 1.4 everything works for me.
Looks like we are at a good spot with PyTorch 1.4, so closing. Please open a new issue if you continue to have problems and thanks.
Most helpful comment
I too am getting a graph page that is empty. I did flush and close the SummaryWriter.
Attaching screenshot including Chrome's console which shows an error that may be related.
Note:
Configuration: