Fairseq: ModuleNotFoundError: No module named 'fairseq.data.data_utils_fast'

Created on 14 May 2020  ·  2Comments  ·  Source: pytorch/fairseq

Hi I was trying to run my code on the source code repo, everything seems fine until run into the batch_by_size function at line 220 in fairseq/data/data_utils.py:

from fairseq.data.data_utils_fast import batch_by_size_fast

The block error occurs:

def batch_by_size(
    indices, num_tokens_fn, max_tokens=None, max_sentences=None,
    required_batch_size_multiple=1,
):
    """
    Yield mini-batches of indices bucketed by size. Batches may contain
    sequences of different lengths.

    Args:
        indices (List[int]): ordered list of dataset indices
        num_tokens_fn (callable): function that returns the number of tokens at
            a given index
        max_tokens (int, optional): max number of tokens in each batch
            (default: None).
        max_sentences (int, optional): max number of sentences in each
            batch (default: None).
        required_batch_size_multiple (int, optional): require batch size to
            be a multiple of N (default: 1).
    """
    try:
        from fairseq.data.data_utils_fast import batch_by_size_fast
    except ImportError:
        raise ImportError(
            'Please build Cython components with: `pip install --editable .` '
            'or `python setup.py build_ext --inplace`'
        )

    max_tokens = max_tokens if max_tokens is not None else -1
    max_sentences = max_sentences if max_sentences is not None else -1
    bsz_mult = required_batch_size_multiple

    if isinstance(indices, types.GeneratorType):
        indices = np.fromiter(indices, dtype=np.int64, count=-1)

    return batch_by_size_fast(indices, num_tokens_fn, max_tokens, max_sentences, bsz_mult)

Code

The reported error is as follows:

2020-05-15 01:02:23 | INFO | fairseq_cli.train | model default-captioning-arch, criterion LabelSmoothedCrossEntropyCriterion
2020-05-15 01:02:23 | INFO | fairseq_cli.train | num. model params: 45776896 (num. trained: 45776896)
2020-05-15 01:02:24 | INFO | fairseq_cli.train | training on 4 GPUs
2020-05-15 01:02:24 | INFO | fairseq_cli.train | max tokens per GPU = 4096 and max sentences per GPU = None
2020-05-15 01:02:24 | INFO | fairseq.trainer | no existing checkpoint found .checkpoints/checkpoint_last.pt
2020-05-15 01:02:24 | INFO | fairseq.trainer | loading train data for epoch 1
2020-05-15 01:02:24 | INFO | fairseq.data.data_utils | loaded 566747 examples from: output/train-captions.en
<!-- before everthing's fine -->
Traceback (most recent call last):
  File "/home/c/Cpt/fair/main.py", line 37, in <module>
    train()
  File "/home/c/Cpt/fair/main.py", line 33, in train
    cli_main()
  File "/home/c/Cpt/fair/fairseq_cli/train.py", line 355, in cli_main
    nprocs=args.distributed_world_size,
  File "/home/c/miniconda3/envs/fa/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/c/miniconda3/envs/fa/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
    while not context.join():
  File "/home/c/miniconda3/envs/fa/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join
    raise Exception(msg)
Exception: 

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/home/c/Cpt/fair/fairseq/data/data_utils.py", line 220, in batch_by_size
    from fairseq.data.data_utils_fast import batch_by_size_fast
ModuleNotFoundError: No module named 'fairseq.data.data_utils_fast'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/c/miniconda3/envs/fa/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
    fn(i, *args)
  File "/home/c/Cpt/fair/fairseq_cli/train.py", line 324, in distributed_main
    main(args, init_distributed=True)
  File "/home/c/Cpt/fair/fairseq_cli/train.py", line 104, in main
    extra_state, epoch_itr = checkpoint_utils.load_checkpoint(args, trainer)
  File "/home/c/Cpt/fair/fairseq/checkpoint_utils.py", line 157, in load_checkpoint
    epoch=1, load_dataset=True, **passthrough_args
  File "/home/c/Cpt/fair/fairseq/trainer.py", line 296, in get_train_iterator
    epoch=epoch
  File "/home/c/Cpt/fair/fairseq/tasks/fairseq_task.py", line 181, in get_batch_iterator
    required_batch_size_multiple=required_batch_size_multiple,
  File "/home/c/Cpt/fair/fairseq/data/data_utils.py", line 223, in batch_by_size
    'Please build Cython components with: `pip install --editable .` '
ImportError: Please build Cython components with: `pip install --editable .` or `python setup.py build_ext --inplace`

What I have tried?

  1. Following the tips to install Cython from all kinds of sources (build from source, pip, conda) -> did not work.
  2. Found that data/data_utils_fast.pyx is a Cython file. Thus tried:
import pyximport
pyximport.install()

Also failed :(

Similar to Issue 1376

My environment?

  • fairseq Version (master):
  • PyTorch Version (1.4)
  • OS (Ubuntu 16.04 LST):
  • How you installed fairseq: source, did not install.
  • Build command you used (if compiling from source): no compile.
  • Python version: 3.7
  • CUDA/cuDNN version: 10.1
  • GPU models and configuration: Everything's fine since it runs correctly under installed fairseq library.
  • Any other relevant information: I just want to run upon the source code without manually building it.
question

Most helpful comment

You cloned fairseq master right? Then you should also run python setup.py build_ext --inplace from the root fairseq directory to build the Cython components.

All 2 comments

You cloned fairseq master right? Then you should also run python setup.py build_ext --inplace from the root fairseq directory to build the Cython components.

You cloned fairseq master right? Then you should also run python setup.py build_ext --inplace from the root fairseq directory to build the Cython components.

Hi Myle, yes, it perfectly works now! I appreciate your response!

Was this page helpful?
0 / 5 - 0 ratings