Machine: Amazon EC2 Deep Learning AMI (pytorch_p36 environment)
I have a training script that is trying to create a TensorboardLogger, but that is failing with the below stacktrace. I have pip installed the latest version of ignite.
File "train.py", line 336, in train
tb_logger = TensorboardLogger(log_dir=None)
File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/ignite/contrib/handlers/tensorboard_logger.py", line 355, in __init__
self.writer = SummaryWriter(log_dir=log_dir)
File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/tensorboardX/writer.py", line 254, in __init__
self._get_file_writer()
File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/tensorboardX/writer.py", line 310, in _get_file_writer
self.file_writer = FileWriter(logdir=self.logdir, **self.kwargs)
TypeError: __init__() got an unexpected keyword argument 'log_dir'
Could you please explain why this is happening and what I can do to fix it? Thanks!
@g-karthik this is due to the newest tensorboardX release 1.7, https://github.com/pytorch/ignite/issues/530
Either install tensorboardX < 1.7 or install ignite from master (pip install git+https://github.com/pytorch/ignite.git), we fixed that recently.
HTH
Awesome, thanks @vfdev-5!
For others blocked on the same issue right now and unwilling to wait until the fix is in PyPI, do:
pip uninstall pytorch-ignite
pip install --upgrade git+https://github.com/pytorch/ignite.git
pip uninstall tensorboardX
pip install --upgrade git+https://github.com/lanpa/tensorboardX.git
Able to successfully create a TensorboardLogger now.
checkpoint_handler = ModelCheckpoint(tb_logger.writer.log_dir, 'checkpoint', save_interval=1, n_saved=3)
AttributeError: 'SummaryWriter' object has no attribute 'log_dir'
It looks like the SummaryWriter changes aren't in master yet?
Okay never mind, I see that I have to install tensorboardX from master too. Works now.
Hi @g-karthik, I am running into the problem you got here
checkpoint_handler = ModelCheckpoint(tb_logger.writer.log_dir, 'checkpoint', save_interval=1, n_saved=3) AttributeError: 'SummaryWriter' object has no attribute 'log_dir'It looks like the
SummaryWriterchanges aren't in master yet?
I have already uninstalled tensorboardX and installed it again using the commands you gave above. What do you mean installing tensorboardX from master?
Thanks
Probably, he meant the following:
pip install --upgrade git+https://github.com/lanpa/tensorboardX.git
I did that, but still am encountering the problem. I reverted my tensorboardX to 1.6, I think it's fixing the problem.
@UltraSpecialException did you install ignite from master too ?
pip uninstall pytorch-ignite
pip install --upgrade git+https://github.com/pytorch/ignite.git
Yeah, I did ran
pip3 uninstall pytorch-ignite
pip3 install --upgrade git+https://github.com/pytorch/ignite.git
pip3 uninstall tensorboardX
pip3 install --upgrade git+https://github.com/lanpa/tensorboardX.git
I'm not sure if your modules are compatible with Python3 or not. But I assume it should, since, if you're wondering I'm trying to run the script found here. Which uses Python3, since I believe Python2 doesn't allow the call .copy() on a list (line 77 of the script).
edit: I just tested the script on Python2, replacing the .copy() with [:], and I get the same error
checkpoint_handler = ModelCheckpoint(tb_logger.writer.log_dir, 'checkpoint', save_interval=1, n_saved=3)
AttributeError: 'SummaryWriter' object has no attribute 'log_dir'
@UltraSpecialException I think the code you are trying to use:
https://github.com/huggingface/transfer-learning-conv-ai/blob/b7f295f840f719056287504554083ec3f2688651/train.py#L234
is working with tensorboardX < 1.7 since tensorboardx master SummaryWriter has attribute logdir instead of log_dir...
https://github.com/lanpa/tensorboardX/blob/3e35c9b5f85e8ceb0294532d9eb772341a04c097/tensorboardX/writer.py#L244
Btw, yes, ignite is compatible with python 2.7 and 3.x
Ah I see, so I suppose installing an older version of ignite was the approach. Thank you!
Most helpful comment
I did that, but still am encountering the problem. I reverted my tensorboardX to 1.6, I think it's fixing the problem.