Allennlp: Why tensors are moved to CPU when calculating metrics?

Created on 28 Jun 2019 · 7Comments · Source: allenai/allennlp

System:

OS: Ubuntu 18.04.2
Python version: 3.6.7
AllenNLP version: v0.8.4
PyTorch version: 1.1.0

Question

When training a simple language model on GPU I get a big computational performance hit when calculating accuracy metrics.
I've tracked it down to this method:
https://github.com/allenai/allennlp/blob/70fa4aac3762c2de48e1123938ead2c50c0bb99c/allennlp/training/metrics/metric.py#L42-L49

I understand, that detach() is required to avoid connection to computational graph and thus memory leaks. But why tensors are transferred to CPU?

When I remove .cpu(), metrics are calculated much faster and I don't see any significant increase in GPU memory consumption.

Contributions welcome

Source

shtratos

Most helpful comment

I think some of our metrics (including some of our earliest ones, when this functionality was originally written) did some loops and stuff that required being on the CPU, and so this got added to the generic method without realizing that it wasn't necessary everywhere. You're right that it should be removed, but we need to be careful to add it back in whatever place it's actually necessary.

This method also made more sense back when pytorch had Variables, and now it's largely unnecessary, or at least misnamed - detach_tensors would be a better name, or a simple call to .detach() inline where this is called.

PR welcome!

matt-gardner on 28 Jun 2019

👍2

All 7 comments

PR welcome!

matt-gardner on 28 Jun 2019

👍2

If I make the change, will tests in CI build pickup any problems where tensors must be on CPU? Or is more complex analysis required?

shtratos on 28 Jun 2019

Our CI only runs on CPU, so this probably won't find all potential problems, unfortunately. I'm not sure what the best way to find this is.

matt-gardner on 28 Jun 2019

This definitely won't work for all of the metrics, because some of them rely on converting tensors in to string labels, which requires the tensors to be on the CPU, e,g:
https://github.com/allenai/allennlp/blob/master/allennlp/training/metrics/span_based_f1_measure.py

The easiest, least overhead way to do this is to make converting the tensors to CPU be optional and have it true by default, with a way to make a metric know that it can run on the GPU.

DeNeutoy on 28 Jun 2019

And when we talk about default options for such a simple function, it definitely makes me think the function should just go.

matt-gardner on 28 Jun 2019

👍1

@matt-gardner, @shtratos we actually do have some limited GPU testing (albeit not on PRs due to having a limited number of GPUs on our TC box). It runs after PRs are merged to master with the intention being to catch issues before we release.

@matt-gardner, you can see the GPU unit tests here.

@shtratos, you can parameterize a test to run only when a GPU is available like this: https://github.com/allenai/allennlp/blob/master/allennlp/tests/training/trainer_test.py#L106 Though you still need to explicitly move your model/tensors to the GPU explicitly.

brendan-ai2 on 28 Jun 2019

@DeNeutoy what if unwrap_to_tensors doesn't send to CPU (or to any device; still detach), but instead the callers that need the tensors in CPU call it? And later consider making those callers work on GPU.

A quick way to test is to change unwrap_to_tensors to send to cuda() instead, and it'll test everything on GPU instead.