Transformers: Windows: No matching distribution found for lightning_base

Created on 30 Jun 2020 · 16Comments · Source: huggingface/transformers

🐛 Bug

I followed the seq2seq readme and wanted to try the sshleifer/distilbart-cnn-12-6 model for absractive text summarization.
I got the bug above, it seems like lightning_base was part of this project before it was moved/removed.

Information

Model I am using: sshleifer/distilbart-cnn-12-6

Language I am using the model on: English

The problem arises when using:

[x] the official example scripts: (give details below)

The tasks I am working on is:

[x] my own task or dataset: (give details below)
Cnn_dm

To reproduce

Steps to reproduce the behavior:

Follow the instructions in the readme and prepare your environment & oull the latest master
Start summarization by using ./finetune.sh \ --data_dir $CNN_DIR \ --train_batch_size=1 \ --eval_batch_size=1 \ --output_dir=xsum_results \ --num_train_epochs 1 \ --model_name_or_path facebook/bart-large
Receive the error

Expected behavior

I would expect the model to start inference

Environment info

transformers version: 2.11.0
- Platform: Windows-10-10.0.18362-SP0
- Python version: 3.7.7
- PyTorch version (GPU?): 1.5.1 (True)
- Tensorflow version (GPU?): 2.1.0 (True)
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: no
@sshleifer, you asked for beeing tagged on issues in the readme

Examples Help wanted

Source

MichaelJanz

👍2

All 16 comments

Also I am a bit worried, that my RTx 2070 with 8GB will be too small for training, since 13GB were recommended for a batch size of 1 with fp16. I appreciate any hints, what I could do to make it run. Thanks you

MichaelJanz on 30 Jun 2020

Also I followed the basic installation process from here:
https://github.com/huggingface/transformers/blob/master/examples/README.md#important-note
but I still get the same error

MichaelJanz on 30 Jun 2020

lightning_base is still there. Do you have a traceback?

sshleifer on 30 Jun 2020

Yep, thats what I get so far:
Traceback (most recent call last):
_File "finetune.py", line 15, in
from lightning_base import BaseTransformer, add_generic_args, generic_train
ModuleNotFoundError: No module named 'lightning_base'_

MichaelJanz on 3 Jul 2020

try export PYTHONPATH="../":"${PYTHONPATH}"
More info in examples/seq2seq/README.md

sshleifer on 3 Jul 2020

👍1

I had the same issue as you described @MichaelJanz, also on Windows10 with python 3.7.7.

To clarify my setup, I followed the instructions under "Important Note" on the transformers/examples page and got an error that faiss was unable to be installed (faiss only supports Linux and MacOS currently I think). I removed faiss from the list of requirements at examples/requirements.txt and ran the example finetune.sh command. Similarly to Michael, I got an error that lightning_base was not found. Since "export" doesn't work on windows command line, I inserted two lines above the lightning_base import in finetune.py:

import sys
sys.path.insert(0, r'C:\Users\chris\transformers\examples

This solved the issue that lightning_base wasn't found, but I encountered a new error:

File "finetune.py", line 17, in <module>
    from lightning_base import BaseTransformer, add_generic_args, generic_train
...
  File "C:\Users\chris\transformers\env\lib\site-packages\tokenizers\__init__.py", line 17, in <module>
    from .tokenizers import Tokenizer, Encoding, AddedToken
ModuleNotFoundError: No module named 'tokenizers.tokenizers'

Looking at the tokenizers package installed, I didn't see an additional folder labeled "tokenizers". The tokenizers version I have within my virtual environment is tokenizers==0.8.0rc4. @sshleifer , could you let me know what version of tokenizers you have in your environment? Let me know if you have any other suggestions about what might be happening (I worry that the problem lies with using Windows).

Edit: for context, I tried running the finetuning script within a Linux environment and had no problems, with the same tokenizers==0.8.0rc4 version. I'm guessing that this whole issue is a Windows problem.

c-col on 3 Jul 2020

👍1

Yeah I have the same version of tokenizers, this seems like a windows problem.

sshleifer on 6 Jul 2020

To fix that for me, I decided to execute _finetune.sh_ directly. I had to insert
export PYTHONPATH="$PATH:(absolute path to the examples folder)
and --data_dir (absolute path to example folder)
However, thats an unpretty workaround, which works.

Then I got into an encoding error, which I had to change line 39 on utils.py to
lns = lmap(str.strip, data_path.open(encoding="UTF-8").readlines())

Then the finetuning process starts. It just is running horribly slow with no GPU usage with the warning:

Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ModuleNotFoundError("No module named 'amp_C'")

So Gpu is not used. I already opened an Issue on Nvidia/Apex about bulding Apex for Windows with Cuda extensions, but any hint here is appreciated:

I am thinking about switching to Ubuntu, since it seems like alot of errors have their origin in windows. Is that a recommended thing?

MichaelJanz on 6 Jul 2020

Before you start running these commands: it probably end up _not_ working due to Mecab issues (see bottom).

FAISS is currently not supported on Windows (though it does have an open project): remove from requirements
instead of wget you can use Invoke-WebRequest on Powershell (after having cd'd into examples/seq2seq):

Invoke-WebRequest https://s3.amazonaws.com/datasets.huggingface.co/summarization/xsum.tar.gz -OutFile xsum.tar.gz

To add the environment variable:

$env:Path += ";" + (Join-Path -Path (Get-Item .).FullName -ChildPath "xsum")

Rather than using the bash file (finetune.sh), I suggest that you open it and copy-paste the Python command that is in there, including the options that are already present, and add your own options after it (things like data dir, model name).

Before running the command, add ../ to PYTHONPATH:

$env:PythonPath += ";../"

After all that, you will probably still run into a problem involving mecab. It is used for Japanese tokenisation, and it is not easy to disable (it's also part of sacrebleu). Mecab has a new v1.0 release that works on Windows, however, it includes breaking changes in the rest of transformers as well as sacrebleu. This is unfortunate because such a small change means that many functionalities or examples cannot be used on Windows, _even if you do not use Japanese_. This is due to the nature of import. I'd rather have that these libraries are only imported when they are needed to maximise cross-platform usage.

BramVanroy on 6 Jul 2020

👍1

@MichaelJanz How did you get passed the mecab issue? And does it run correctly without the AMP flag?

Running Ubuntu is one way to go. However, my personal recommendation is waiting a bit until WSL has full GPU support (which is currently in beta, you can try it out!). That way, you can still enjoy your good ol' Windows experience, and only open up an Ubuntu terminal when running your experiments.

BramVanroy on 6 Jul 2020

@BramVanroy It runs well so far under windows, but I dont know what the AMP flag is for. Just the gpu support is missing in training, however gpu is available during testing (atleast I get some gpu usage there and high clock)

About mecab, I did not have any issues with it at all. In theory, training is working and possible, it just takes way too long.

Ty for the hint about WSL gpu support, I am just working on getting that to run

MichaelJanz on 6 Jul 2020

It is very odd how you did not have any issues with mecab because the pinned versions are not supported on Windows...

I meant the --fp16 flag. If used, it will try to use AMP. But since you seem to have issues with AMP, you can try to remove --fp16 from the command in finetune.sh.

BramVanroy on 6 Jul 2020

I removed the --fp16 flag and the missing AMP_C message is gone. But still the gpu is not used. Could it be, that it is too small for that model, so it just uses the cpu? I dont know how Pytorch handles memory issues.
Thats my screen so far. As you can see, the gpu is not utilized.
bertabs_gpu

MichaelJanz on 6 Jul 2020

Are you sure? Can you let the code run for a couple of steps and then monitor the GPU? It is possible that the code first does some data preprocessing, which would be CPU-intensive, without GPU. Only when training really starts (and you see the steps moving) the GPU should be used.

BramVanroy on 6 Jul 2020

@MichaelJanz export PYTHONPATH="$PATH:(absolute path to the examples folder)
do you mean something like this? export PYTHONPATH="$PATH:/content/transformers/examples"
I'm using google collab and the path to examples is /content/transformers/examples,
Sorry I'm a complete noob when it comes to python

Edit:
I think I fixed it by giving the path to finetune.py in finetune.sh :
python /content/transformers/examples/seq2seq/finetune.py \
--learning_rate=3e-5 \
--fp16 \
--gpus 1 \
--do_train \
--do_predict \
--n_val 1000 \
--val_check_interval 0.1 \
--sortish_sampler \
$@

However, I got a new error:

File "/content/transformers/examples/seq2seq/finetune.sh", line 7
--gpus 1 \
is it normal?
^

Hildweig on 7 Jul 2020

@Hildweig make sure your lightning example is up to date with examples/requirements.txt. Closing this. Pls make a new issue if you are still struggling to get things working :)

sshleifer on 27 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings