Pytorch-lightning: using LBFGS optimizer in pytorch lightening the model is not converging as compared to native pytoch + LBFGS

Created on 11 Oct 2020 · 14Comments · Source: PyTorchLightning/pytorch-lightning

Common bugs:

Comparing the results of LBFGS + Pytorch lightening to native pytorch + LBFGS, Pytorch lightening is not able to update wights and model is not converging. there are some issues to point out:

Adam + Pytorch lightening on MNIST works fine, however LBFGS + Pytorch lightening is not working as expected.
LBFGS + Native pytorch works very well, however when we try LBFGS + Pytorch lightening it does not work as expected.

🐛 Bug

LBFGS + Pytorch Lightening has problem converging and weights are updating as compared to Adam + Pytorch lightening.

Code sample

import os
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torchvision.datasets import MNIST
from torchvision import transforms,datasets
from torch.utils.data import DataLoader,random_split
import pytorch_lightning as pl 
from IPython.display import clear_output

class LightningMNISTClassifier(pl.LightningModule):
  def __init__(self):
    super(LightningMNISTClassifier,self).__init__()
    self.layer_1 = nn.Linear(28 * 28, 128)
    self.layer_2 = nn.Linear(128, 256)
    self.layer_3 = nn.Linear(256, 10)

  def forward(self, x):
    batch_size, channels, width, height = x.size()
    x=x.view(batch_size,-1)
    # layer 1
    x = self.layer_1(x)
    x = torch.relu(x)
    # layer 2
    x = self.layer_2(x)
    x = torch.relu(x) 
    # layer 3
    x = self.layer_3(x)
    # probability distribution over labels
    x = torch.log_softmax(x, dim=1)  
    return x 
  def prepare_data(self):
    transform=transforms.Compose([transforms.ToTensor(), 
                                  transforms.Normalize((0.1307,), (0.3081,))])
    # prepare transforms standard to MNIST
    mnist_train = MNIST(os.getcwd(), train=True, download=True, transform=transform)
    mnist_test = MNIST(os.getcwd(), train=False, download=True, transform=transform)  
    self.mnist_train, self.mnist_val = random_split(mnist_train, [55000, 5000])

  def train_dataloader(self):
    return DataLoader(self.mnist_train,batch_size=1024)

  # def val_dataloader(self):
  #   return DataLoader(self.mnist_val,batch_size=1024)
  # def test_dataloader(self):
  #   return DataLoader(self.mnist_test,batch_size=1024)


  def configure_optimizers(self):
    # optimizer=optim.Adam(self.parameters(),lr=1e-3)
    optimizer = optim.LBFGS(self.parameters(), lr=1e-2)
    return optimizer

  # def backward(self, trainer, loss, optimizer):
  #   loss.backward(retain_graph=True)


  def optimizer_step(self, current_epoch, batch_nb, optimizer, optimizer_idx,
                     second_order_closure, on_tpu=False, using_native_amp=False,
                     using_lbfgs=False):
        # update params
      optimizer.step(second_order_closure) 

  def cross_entropy_loss(self,logits,labels):
    return F.nll_loss(logits,labels)

  def training_step(self,train_batch,batch_idx):
    x,y=train_batch
    logits=self.forward(x)
    loss=self.cross_entropy_loss(logits,y)
    return  {'loss':loss}

  def training_epoch_end(self,outputs):
    avg_loss=torch.stack([x['loss'] for x in outputs]).mean()
    print('epoch={}, avg_Train_loss={:.2f}'.format(self.current_epoch,avg_loss.item()))
    # return {'avg_train_loss':avg_loss}

  # def validation_step(self,val_batch,batch_idx):
  #   x,y=val_batch
  #   logits=self.forward(x)
  #   loss=self.cross_entropy_loss(logits,y)
  #   return {'val_loss':loss}
  # def validation_epoch_end(self,outputs):
  #   avg_loss=torch.stack([x['val_loss'] for x in outputs]).mean()
  #   print('epoch={}, avg_Test_loss={:.2f}'.format(self.current_epoch,avg_loss.item()))
  #   return {'avg_val_loss':avg_loss}

model=LightningMNISTClassifier()
#from pytorch_lightning.callbacks import EarlyStopping
trainer=pl.Trainer(max_epochs=400,gpus=1,
                  #  check_val_every_n_epoch=2,
                  #  accumulate_grad_batches=5,
#                   early_stop_callback=early_stop,
                  #  limit_train_batches=50,
#                   val_check_interval=0.25,
                   progress_bar_refresh_rate=0,
#                   num_sanity_val_steps=0,
                   weights_summary=None)
clear_output(wait=True)
trainer.fit(model)Preformatted text.

Expected behavior

Environment

Please copy and paste the output from our
environment collection script
(or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/PyTorchLightning/pytorch-lightning/master/tests/collect_env_details.py
# For security purposes, please check the contents of collect_env_details.py before running it.
python collect_env_details.py

Environment:
-Colab and pycharm
-PyTorch version: 1.6.0+CPU and GPU
-pytorch-lightning==1.0.0rc3

Priority P2 bug / fix help wanted

Source

peymanpoozesh

All 14 comments

do you have the code for native PyTorch + LBFGS for the same?

rohitgr7 on 11 Oct 2020

this is the code including MNIST and LBFGS that works fine with native pytorch:

import os
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torchvision.datasets import MNIST
from torchvision import transforms,datasets
from torch.utils.data import DataLoader,random_split


class PytorchMNISTClassifier(nn.Module):
  def __init__(self):
    super(PytorchMNISTClassifier,self).__init__()
    self.layer_1 = nn.Linear(28 * 28, 128)
    self.layer_2 = nn.Linear(128, 256)
    self.layer_3 = nn.Linear(256, 10)
  def forward(self, x):
    batch_size, channels, width, height = x.size()
    x=x.view(batch_size,-1)
    # layer 1
    x = self.layer_1(x)
    x = torch.relu(x)
    # layer 2
    x = self.layer_2(x)
    x = torch.relu(x) 
    # layer 3
    x = self.layer_3(x)
    # probability distribution over labels
    x = torch.log_softmax(x, dim=1)  
    return x 

def cross_entropy_loss(logits,labels):
  return F.nll_loss(logits,labels)

if __name__ == '__main__':

  if torch.cuda.is_available():
    device=torch.device('cuda:0')
  else:
    device=torch.device('cpu')

  model=PytorchMNISTClassifier()
  model=model.to(device)
  # optimizer=optim.Adam(model.parameters(),lr=1e-3)
  optimizer = optim.LBFGS(model.parameters(),lr=0.01)

  transform=transforms.Compose([transforms.ToTensor(), 
                                  transforms.Normalize((0.1307,), (0.3081,))])
  # prepare transforms standard to MNIST
  mnist_train = MNIST(os.getcwd(), train=True, download=True, transform=transform)
  mnist_test = MNIST(os.getcwd(), train=False, download=True, transform=transform)  
  mnist_train, mnist_val = random_split(mnist_train, [55000, 5000])

  data=DataLoader(mnist_train,batch_size=1024)

  for Epoch in range(10):
    loss_total=0.
    for i,(x,y) in enumerate(data):
      x=x.to(device)
      y=y.to(device)
      def closure():
        logits=model(x)
        optimizer.zero_grad()
        loss=cross_entropy_loss(logits,y)
        loss.backward(retain_graph=True)
        return loss
    loss_out = optimizer.step(closure)
    loss_total+=loss_out.item()
    print('total_loss--->', loss_total)

peymanpoozesh on 11 Oct 2020

👍1

You don't need to override optimizer_step... you're only doing it to pass in the second_order closure, but that's exactly what the default implementation does

        if on_tpu:
            xm.optimizer_step(optimizer)
        elif using_native_amp:
            self.trainer.scaler.step(optimizer)
        elif using_lbfgs:
            optimizer.step(second_order_closure)
        else:
            optimizer.step()

williamFalcon on 12 Oct 2020

@williamFalcon still should converge right?? even if the overridden method is doing the same update. Maybe a bug here if it's not converging in pl. will check this.

rohitgr7 on 12 Oct 2020

@williamFalcon we modified the code by removing optimizer_step, however it dose not help solving the issue.

peymanpoozesh on 12 Oct 2020

ok found something. not sure if it's correct or not since I haven't used LBFGS before.

I checked that optim.LBFGS calls closure 20 times for each step and in this example it doesn't call any step and .backward() explicitly but relies on optimizer.step(closure) to do that. Also in every 20 steps the underlying loss is different.

But pl calls an explicit training_step with the closure obviously that means it will be called 21 times + an explicit loss.backward() is called always.

These are my observations. Anyone with prior experience with LBFGS optimizer can confirm the right way to do this.

rohitgr7 on 12 Oct 2020

how many times does it get called with pytorch?

LBFGS is a quasi knewton method which means it does not compute the hessian directly but instead it approximates it.

I assume pytorch calls step multiple times to do this approximation?

williamFalcon on 12 Oct 2020

how many times does it get called with pytorch?

the given example calls it 20 times. ~~I think it always calls it 20 times, checked a few examples.~~

rohitgr7 on 12 Oct 2020

the default value for the number of iterations is 20 times , based on the pytorch help:

torch.optim.LBFGS(params, lr=1, max_iter=20, max_eval=None, tolerance_grad=1e-07, tolerance_change=1e-09, history_size=100, line_search_fn=None)

peymanpoozesh on 12 Oct 2020

@williamFalcon We are in the process of developing a code that requires me to use LBFGS optimizer. I'd like to use pytorch-lightening platform for this code. do you think that the LBFGS issue can be resolved any time soon in the later versions?

peymanpoozesh on 15 Oct 2020

@rohitgr7 unfortunately it does not seem to be fixed by #4190 even though the number of backward calls are now correct (there is a test for that). The loss is still not decreasing though (haven't investigated further)

justusschock on 21 Oct 2020

ok will check this if I get some time :)

rohitgr7 on 21 Oct 2020

👍1

@williamFalcon it seems that the LBFGS optimizer in the latest version of pytorch-lightening carries the same issue as the previous versions. Is there a way to fix this issue temporarily up to the time that bug gets fixed.

peymanpoozesh on 27 Oct 2020

@Borda , @edenlightning ,LBFGS issue dose not seem be fixed in the latest version of pytorch Lightening. should we hope that this issue could be fixed in the near future? we started a project using pytorch lightening and got stuck because of not being able to use LBFGS optimizer. if it is not fixed yet, would be possible to expedite resolving this issue?

peymanpoozesh on 27 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings