Transformers: Trainer makes RAM go out of memory after a while

Created on 29 Oct 2020  路  6Comments  路  Source: huggingface/transformers

Environment info

  • transformers version: 3.4.0
  • Platform: Linux-4.14.193-113.317.amzn1.x86_64-x86_64-with-glibc2.9
  • Python version: 3.6.10
  • PyTorch version (GPU?): 1.6.0 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Using GPU in script?: True
  • Using distributed or parallel set-up in script?: False

Who can help

@sgugger @patrickvonplaten

Information

Model I am using: T5

The problem arises when using my own modified scripts:
I load my dataset this way:

def tokenize(batch):
    tokenized_input = tokenizer(batch[text_column], padding=True, truncation=True, max_length=153)
    tokenized_label = tokenizer(batch[generated_column], padding=True, truncation=True, max_length=274)

    tokenized_input['labels'] = tokenized_label['input_ids']

    return tokenized_input

dataset = load_dataset('csv', data_files=dataset_file, split='train')
dataset = dataset.train_test_split(test_size=0.05, seed=SEED)
train_dataset = dataset['train']
val_dataset = dataset['test']

train_dataset = train_dataset.map(tokenize, batched=True, batch_size=len(train_dataset))
val_dataset = val_dataset.map(tokenize, batched=True, batch_size=len(val_dataset))
train_dataset.set_format('numpy', columns=['input_ids', 'attention_mask', 'labels'])
val_dataset.set_format('numpy', columns=['input_ids', 'attention_mask', 'labels'])

And then I use Trainer to train my T5 model like this:

training_args = TrainingArguments(
output_dir=output_dir,
num_train_epochs=1,
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
eval_accumulation_steps=1,
learning_rate=0.001,
evaluation_strategy='steps',
save_steps=1000000,
save_total_limit=1,
remove_unused_columns=True,
run_name=now,
logging_steps=100,
eval_steps=100,
logging_first_step=True
)

trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=val_dataset
)

trainer.train()

The tasks I am working on is my own task or dataset:
I am using a custom dataset for machine translation which has 12MB size and 18.000 examples. The sequence max token sizes are 153 for input and 274 for output. I have also added 68 special tokens as the dataset has many symbols in it.

To reproduce

Steps to reproduce the behavior:

  1. Load a dataset like I did.
  2. Start training using Trainer
  3. During every evaluation, RAM usage grows and is not freed. So the next evaluation step accumulates other RAM and so on, until you reach the maximum and the training stops giving this error: RuntimeError: [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 281882432 bytes. Error code 12 (Cannot allocate memory) (The machine I am using has 60GB RAM).

Expected behavior

The evaluation RAM should be freed after every step. Looks like something gets accumulated while training and RAM is not freed. I get the same behavior if I don't run training but only evaluation: after many evaluation steps the RAM blows up.

Most helpful comment

During evaluation, we need to store predictions and labels too, for the metric computation. If you want to store the loss only, then pass along the flag prediction_loss_only=True to your training arguments, which will use less more RAM (and you can then probably remove the eval_accumulation_steps=1 to speed up evaluation).

All 6 comments

Additional info:
as a workaround, I am now using a smaller validation set, but it is not ideal. If the memory issue can't be solved, a better solution could be to introduce an option to use a random subset of the validation set to use to evaluate during training.

If the problem is just that the RAM is not freed after evaluation, we can try to work around that (though Python garbage collector can be tricky to trigger).

If the validation set gives predictions that do not fit in RAM, we can't do much in the generic Trainer directly. You can subclass Trainer and the evaluate function to use the datasets library Metric objects, which store the predictions with arrows so use less RAM.

If the problem is just that the RAM is not freed after evaluation, we can try to work around that (though Python garbage collector can be tricky to trigger).

I think the problem is not this one. The RAM is freed after evaluation (after some seconds), but it is not freed between an evaluation single step and the other. Correct me if I am wrong, but after a step the only thing to keep in RAM should be the loss, so it can be averaged at the end of evaluation, so the RAM usage should not increase as the steps go ahead, which instead is what happens.

During evaluation, we need to store predictions and labels too, for the metric computation. If you want to store the loss only, then pass along the flag prediction_loss_only=True to your training arguments, which will use less more RAM (and you can then probably remove the eval_accumulation_steps=1 to speed up evaluation).

I didn't know that, it solved my problem thank you!

Should even be automatic now as I just merged a PR on master where the Trainer does not bother saving the predictions when there is no compute_metrics (which is your case here).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

guanlongtianzi picture guanlongtianzi  路  3Comments

HansBambel picture HansBambel  路  3Comments

lcswillems picture lcswillems  路  3Comments

chuanmingliu picture chuanmingliu  路  3Comments

alphanlp picture alphanlp  路  3Comments