Fairseq: ModuleNotFoundError: No module named 'fairseq.data.data_utils_fast'

Created on 14 May 2020 · 2Comments · Source: pytorch/fairseq

❓

Hi I was trying to run my code on the source code repo, everything seems fine until run into the batch_by_size function at line 220 in fairseq/data/data_utils.py:

from fairseq.data.data_utils_fast import batch_by_size_fast

The block error occurs:

def batch_by_size(
    indices, num_tokens_fn, max_tokens=None, max_sentences=None,
    required_batch_size_multiple=1,
):
    """
    Yield mini-batches of indices bucketed by size. Batches may contain
    sequences of different lengths.

    Args:
        indices (List[int]): ordered list of dataset indices
        num_tokens_fn (callable): function that returns the number of tokens at
            a given index
        max_tokens (int, optional): max number of tokens in each batch
            (default: None).
        max_sentences (int, optional): max number of sentences in each
            batch (default: None).
        required_batch_size_multiple (int, optional): require batch size to
            be a multiple of N (default: 1).
    """
    try:
        from fairseq.data.data_utils_fast import batch_by_size_fast
    except ImportError:
        raise ImportError(
            'Please build Cython components with: `pip install --editable .` '
            'or `python setup.py build_ext --inplace`'
        )

    max_tokens = max_tokens if max_tokens is not None else -1
    max_sentences = max_sentences if max_sentences is not None else -1
    bsz_mult = required_batch_size_multiple

    if isinstance(indices, types.GeneratorType):
        indices = np.fromiter(indices, dtype=np.int64, count=-1)

    return batch_by_size_fast(indices, num_tokens_fn, max_tokens, max_sentences, bsz_mult)

Code

The reported error is as follows:

2020-05-15 01:02:23 | INFO | fairseq_cli.train | model default-captioning-arch, criterion LabelSmoothedCrossEntropyCriterion
2020-05-15 01:02:23 | INFO | fairseq_cli.train | num. model params: 45776896 (num. trained: 45776896)
2020-05-15 01:02:24 | INFO | fairseq_cli.train | training on 4 GPUs
2020-05-15 01:02:24 | INFO | fairseq_cli.train | max tokens per GPU = 4096 and max sentences per GPU = None
2020-05-15 01:02:24 | INFO | fairseq.trainer | no existing checkpoint found .checkpoints/checkpoint_last.pt
2020-05-15 01:02:24 | INFO | fairseq.trainer | loading train data for epoch 1
2020-05-15 01:02:24 | INFO | fairseq.data.data_utils | loaded 566747 examples from: output/train-captions.en
<!-- before everthing's fine -->
Traceback (most recent call last):
  File "/home/c/Cpt/fair/main.py", line 37, in <module>
    train()
  File "/home/c/Cpt/fair/main.py", line 33, in train
    cli_main()
  File "/home/c/Cpt/fair/fairseq_cli/train.py", line 355, in cli_main
    nprocs=args.distributed_world_size,
  File "/home/c/miniconda3/envs/fa/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/c/miniconda3/envs/fa/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
    while not context.join():
  File "/home/c/miniconda3/envs/fa/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join
    raise Exception(msg)
Exception: 

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/home/c/Cpt/fair/fairseq/data/data_utils.py", line 220, in batch_by_size
    from fairseq.data.data_utils_fast import batch_by_size_fast
ModuleNotFoundError: No module named 'fairseq.data.data_utils_fast'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/c/miniconda3/envs/fa/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
    fn(i, *args)
  File "/home/c/Cpt/fair/fairseq_cli/train.py", line 324, in distributed_main
    main(args, init_distributed=True)
  File "/home/c/Cpt/fair/fairseq_cli/train.py", line 104, in main
    extra_state, epoch_itr = checkpoint_utils.load_checkpoint(args, trainer)
  File "/home/c/Cpt/fair/fairseq/checkpoint_utils.py", line 157, in load_checkpoint
    epoch=1, load_dataset=True, **passthrough_args
  File "/home/c/Cpt/fair/fairseq/trainer.py", line 296, in get_train_iterator
    epoch=epoch
  File "/home/c/Cpt/fair/fairseq/tasks/fairseq_task.py", line 181, in get_batch_iterator
    required_batch_size_multiple=required_batch_size_multiple,
  File "/home/c/Cpt/fair/fairseq/data/data_utils.py", line 223, in batch_by_size
    'Please build Cython components with: `pip install --editable .` '
ImportError: Please build Cython components with: `pip install --editable .` or `python setup.py build_ext --inplace`

What I have tried?

Following the tips to install Cython from all kinds of sources (build from source, pip, conda) -> did not work.
Found that data/data_utils_fast.pyx is a Cython file. Thus tried:

import pyximport
pyximport.install()

Also failed :(

Similar to Issue 1376

My environment?

fairseq Version (master):
PyTorch Version (1.4)
OS (Ubuntu 16.04 LST):
How you installed fairseq: source, did not install.
Build command you used (if compiling from source): no compile.
Python version: 3.7
CUDA/cuDNN version: 10.1
GPU models and configuration: Everything's fine since it runs correctly under installed fairseq library.
Any other relevant information: I just want to run upon the source code without manually building it.

question

Source

cyk1337

Most helpful comment

You cloned fairseq master right? Then you should also run python setup.py build_ext --inplace from the root fairseq directory to build the Cython components.

myleott on 15 May 2020

❤2 🎉2

All 2 comments

You cloned fairseq master right? Then you should also run python setup.py build_ext --inplace from the root fairseq directory to build the Cython components.

myleott on 15 May 2020

❤2 🎉2

You cloned fairseq master right? Then you should also run python setup.py build_ext --inplace from the root fairseq directory to build the Cython components.

Hi Myle, yes, it perfectly works now! I appreciate your response!

cyk1337 on 15 May 2020

Was this page helpful?

0 / 5 - 0 ratings