Fairseq: Generation with model parallel Megatron LM

Created on 22 Jul 2020 · 7Comments · Source: pytorch/fairseq

❓ Questions and Help

What is your question?

Is there an example demonstrating how to generate using Megatron LM that was trained using model parallelism? The Megatron LM page shows how to run evaluation but there's no information on running generation.

What have you tried?

I tried running the below command but got an error.

Command:

fairseq-generate \
  $DATA_PATH \
  --path $MODEL_PATH \
  --task language_modeling \
  --gen-subset test \
  --max-sentences 8 \
  --criterion cross_entropy \
  --beam 1 \
  --sampling \
  --sampling-topp 0.9 \
  --temperature 0.01 \
  --prefix-size 200 \
  --distributed-world-size 8 \
  --results-path $RESULTS_PATH \
  --model-parallel-size 8;

Error:
/opt/conda/conda-bld/pytorch_1579022034529/work/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [3,0,0] AssertionindexValue >= 0 && indexValue < src.sizes[dim]failed.

After some debugging, I found that this line in the code caused the above error. But I'm unsure of the cause. It's possible there are some setup issues (data etc). But an example on how to setup and run generation using model parallel megatron LM would be great. Thank you.

question

Source

rakeshchada

Most helpful comment

1) Will push the the scirpts to glue and split partitions to master soon.

2) yeah the line break issue might need some thinking, will take a look at that also.

ngoyal2707 on 18 Sep 2020

👍2

All 7 comments

I think, this is on me, I never got around to fixing generate script with megatron MP. I can look into it, but no promises on timeline yet.
For a reasonable size of model, you can just stitch the model part back into a single model and do the generations reglarly.

Maybe give that a try here?

ngoyal2707 on 22 Jul 2020

Sure. I can try that. Appreciate if you can give an example that shows how to stitch model parts to one.

rakeshchada on 23 Jul 2020

Can anyone confirm that Megatron 11b treats all contiguous spaces as a single space? With some hacky code I have it successfully generating on 2 GPUs (after merging and re-splitting the partitions) but it doesn't seem to understand line breaks. That's a little disappointing since it seems smarter than GPT-2 in a lot of other ways.

Perhaps this code was used during training?
https://github.com/pytorch/fairseq/blob/4c55744ec4cb26749cf2cf8dac89942f26ce4bd2/fairseq/tokenizer.py#L8-L14
There wouldn't seem to be any easy solution. Still appreciate Facebook making the biggest public model release.