Hi I was trying to run my code on the source code repo, everything seems fine until run into the batch_by_size function at line 220 in fairseq/data/data_utils.py:
from fairseq.data.data_utils_fast import batch_by_size_fast
The block error occurs:
def batch_by_size(
indices, num_tokens_fn, max_tokens=None, max_sentences=None,
required_batch_size_multiple=1,
):
"""
Yield mini-batches of indices bucketed by size. Batches may contain
sequences of different lengths.
Args:
indices (List[int]): ordered list of dataset indices
num_tokens_fn (callable): function that returns the number of tokens at
a given index
max_tokens (int, optional): max number of tokens in each batch
(default: None).
max_sentences (int, optional): max number of sentences in each
batch (default: None).
required_batch_size_multiple (int, optional): require batch size to
be a multiple of N (default: 1).
"""
try:
from fairseq.data.data_utils_fast import batch_by_size_fast
except ImportError:
raise ImportError(
'Please build Cython components with: `pip install --editable .` '
'or `python setup.py build_ext --inplace`'
)
max_tokens = max_tokens if max_tokens is not None else -1
max_sentences = max_sentences if max_sentences is not None else -1
bsz_mult = required_batch_size_multiple
if isinstance(indices, types.GeneratorType):
indices = np.fromiter(indices, dtype=np.int64, count=-1)
return batch_by_size_fast(indices, num_tokens_fn, max_tokens, max_sentences, bsz_mult)
The reported error is as follows:
2020-05-15 01:02:23 | INFO | fairseq_cli.train | model default-captioning-arch, criterion LabelSmoothedCrossEntropyCriterion
2020-05-15 01:02:23 | INFO | fairseq_cli.train | num. model params: 45776896 (num. trained: 45776896)
2020-05-15 01:02:24 | INFO | fairseq_cli.train | training on 4 GPUs
2020-05-15 01:02:24 | INFO | fairseq_cli.train | max tokens per GPU = 4096 and max sentences per GPU = None
2020-05-15 01:02:24 | INFO | fairseq.trainer | no existing checkpoint found .checkpoints/checkpoint_last.pt
2020-05-15 01:02:24 | INFO | fairseq.trainer | loading train data for epoch 1
2020-05-15 01:02:24 | INFO | fairseq.data.data_utils | loaded 566747 examples from: output/train-captions.en
<!-- before everthing's fine -->
Traceback (most recent call last):
File "/home/c/Cpt/fair/main.py", line 37, in <module>
train()
File "/home/c/Cpt/fair/main.py", line 33, in train
cli_main()
File "/home/c/Cpt/fair/fairseq_cli/train.py", line 355, in cli_main
nprocs=args.distributed_world_size,
File "/home/c/miniconda3/envs/fa/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/c/miniconda3/envs/fa/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
while not context.join():
File "/home/c/miniconda3/envs/fa/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join
raise Exception(msg)
Exception:
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/home/c/Cpt/fair/fairseq/data/data_utils.py", line 220, in batch_by_size
from fairseq.data.data_utils_fast import batch_by_size_fast
ModuleNotFoundError: No module named 'fairseq.data.data_utils_fast'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/c/miniconda3/envs/fa/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
fn(i, *args)
File "/home/c/Cpt/fair/fairseq_cli/train.py", line 324, in distributed_main
main(args, init_distributed=True)
File "/home/c/Cpt/fair/fairseq_cli/train.py", line 104, in main
extra_state, epoch_itr = checkpoint_utils.load_checkpoint(args, trainer)
File "/home/c/Cpt/fair/fairseq/checkpoint_utils.py", line 157, in load_checkpoint
epoch=1, load_dataset=True, **passthrough_args
File "/home/c/Cpt/fair/fairseq/trainer.py", line 296, in get_train_iterator
epoch=epoch
File "/home/c/Cpt/fair/fairseq/tasks/fairseq_task.py", line 181, in get_batch_iterator
required_batch_size_multiple=required_batch_size_multiple,
File "/home/c/Cpt/fair/fairseq/data/data_utils.py", line 223, in batch_by_size
'Please build Cython components with: `pip install --editable .` '
ImportError: Please build Cython components with: `pip install --editable .` or `python setup.py build_ext --inplace`
Cython from all kinds of sources (build from source, pip, conda) -> did not work.data/data_utils_fast.pyx is a Cython file. Thus tried:import pyximport
pyximport.install()
Also failed :(
Similar to Issue 1376
You cloned fairseq master right? Then you should also run python setup.py build_ext --inplace from the root fairseq directory to build the Cython components.
You cloned fairseq master right? Then you should also run
python setup.py build_ext --inplacefrom the root fairseq directory to build the Cython components.
Hi Myle, yes, it perfectly works now! I appreciate your response!
Most helpful comment
You cloned fairseq master right? Then you should also run
python setup.py build_ext --inplacefrom the root fairseq directory to build the Cython components.