I followed the seq2seq readme and wanted to try the sshleifer/distilbart-cnn-12-6 model for absractive text summarization.
I got the bug above, it seems like lightning_base was part of this project before it was moved/removed.
Model I am using: sshleifer/distilbart-cnn-12-6
Language I am using the model on: English
The problem arises when using:
The tasks I am working on is:
Steps to reproduce the behavior:
./finetune.sh \
--data_dir $CNN_DIR \
--train_batch_size=1 \
--eval_batch_size=1 \
--output_dir=xsum_results \
--num_train_epochs 1 \
--model_name_or_path facebook/bart-largeI would expect the model to start inference
transformers version: 2.11.0
@sshleifer, you asked for beeing tagged on issues in the readme
Also I am a bit worried, that my RTx 2070 with 8GB will be too small for training, since 13GB were recommended for a batch size of 1 with fp16. I appreciate any hints, what I could do to make it run. Thanks you
Also I followed the basic installation process from here:
https://github.com/huggingface/transformers/blob/master/examples/README.md#important-note
but I still get the same error
lightning_base is still there. Do you have a traceback?
Yep, thats what I get so far:
Traceback (most recent call last):
_File "finetune.py", line 15, in
from lightning_base import BaseTransformer, add_generic_args, generic_train
ModuleNotFoundError: No module named 'lightning_base'_
try export PYTHONPATH="../":"${PYTHONPATH}"
More info in examples/seq2seq/README.md
I had the same issue as you described @MichaelJanz, also on Windows10 with python 3.7.7.
To clarify my setup, I followed the instructions under "Important Note" on the transformers/examples page and got an error that faiss was unable to be installed (faiss only supports Linux and MacOS currently I think). I removed faiss from the list of requirements at examples/requirements.txt and ran the example finetune.sh command. Similarly to Michael, I got an error that lightning_base was not found. Since "export" doesn't work on windows command line, I inserted two lines above the lightning_base import in finetune.py:
import sys
sys.path.insert(0, r'C:\Users\chris\transformers\examples
This solved the issue that lightning_base wasn't found, but I encountered a new error:
File "finetune.py", line 17, in <module>
from lightning_base import BaseTransformer, add_generic_args, generic_train
...
File "C:\Users\chris\transformers\env\lib\site-packages\tokenizers\__init__.py", line 17, in <module>
from .tokenizers import Tokenizer, Encoding, AddedToken
ModuleNotFoundError: No module named 'tokenizers.tokenizers'
Looking at the tokenizers package installed, I didn't see an additional folder labeled "tokenizers". The tokenizers version I have within my virtual environment is tokenizers==0.8.0rc4. @sshleifer , could you let me know what version of tokenizers you have in your environment? Let me know if you have any other suggestions about what might be happening (I worry that the problem lies with using Windows).
Edit: for context, I tried running the finetuning script within a Linux environment and had no problems, with the same tokenizers==0.8.0rc4 version. I'm guessing that this whole issue is a Windows problem.
Yeah I have the same version of tokenizers, this seems like a windows problem.
To fix that for me, I decided to execute _finetune.sh_ directly. I had to insert
export PYTHONPATH="$PATH:(absolute path to the examples folder)
and --data_dir (absolute path to example folder)
However, thats an unpretty workaround, which works.
Then I got into an encoding error, which I had to change line 39 on utils.py to
lns = lmap(str.strip, data_path.open(encoding="UTF-8").readlines())
Then the finetuning process starts. It just is running horribly slow with no GPU usage with the warning:
Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ModuleNotFoundError("No module named 'amp_C'")
So Gpu is not used. I already opened an Issue on Nvidia/Apex about bulding Apex for Windows with Cuda extensions, but any hint here is appreciated:
I am thinking about switching to Ubuntu, since it seems like alot of errors have their origin in windows. Is that a recommended thing?
Before you start running these commands: it probably end up _not_ working due to Mecab issues (see bottom).
wget you can use Invoke-WebRequest on Powershell (after having cd'd into examples/seq2seq): Invoke-WebRequest https://s3.amazonaws.com/datasets.huggingface.co/summarization/xsum.tar.gz -OutFile xsum.tar.gz
$env:Path += ";" + (Join-Path -Path (Get-Item .).FullName -ChildPath "xsum")
Rather than using the bash file (finetune.sh), I suggest that you open it and copy-paste the Python command that is in there, including the options that are already present, and add your own options after it (things like data dir, model name).
Before running the command, add ../ to PYTHONPATH:
$env:PythonPath += ";../"
After all that, you will probably still run into a problem involving mecab. It is used for Japanese tokenisation, and it is not easy to disable (it's also part of sacrebleu). Mecab has a new v1.0 release that works on Windows, however, it includes breaking changes in the rest of transformers as well as sacrebleu. This is unfortunate because such a small change means that many functionalities or examples cannot be used on Windows, _even if you do not use Japanese_. This is due to the nature of import. I'd rather have that these libraries are only imported when they are needed to maximise cross-platform usage.
@MichaelJanz How did you get passed the mecab issue? And does it run correctly without the AMP flag?
Running Ubuntu is one way to go. However, my personal recommendation is waiting a bit until WSL has full GPU support (which is currently in beta, you can try it out!). That way, you can still enjoy your good ol' Windows experience, and only open up an Ubuntu terminal when running your experiments.
@BramVanroy It runs well so far under windows, but I dont know what the AMP flag is for. Just the gpu support is missing in training, however gpu is available during testing (atleast I get some gpu usage there and high clock)
About mecab, I did not have any issues with it at all. In theory, training is working and possible, it just takes way too long.
Ty for the hint about WSL gpu support, I am just working on getting that to run
It is very odd how you did not have any issues with mecab because the pinned versions are not supported on Windows...
I meant the --fp16 flag. If used, it will try to use AMP. But since you seem to have issues with AMP, you can try to remove --fp16 from the command in finetune.sh.
I removed the --fp16 flag and the missing AMP_C message is gone. But still the gpu is not used. Could it be, that it is too small for that model, so it just uses the cpu? I dont know how Pytorch handles memory issues.
Thats my screen so far. As you can see, the gpu is not utilized.

Are you sure? Can you let the code run for a couple of steps and then monitor the GPU? It is possible that the code first does some data preprocessing, which would be CPU-intensive, without GPU. Only when training really starts (and you see the steps moving) the GPU should be used.
@MichaelJanz export PYTHONPATH="$PATH:(absolute path to the examples folder)
do you mean something like this? export PYTHONPATH="$PATH:/content/transformers/examples"
I'm using google collab and the path to examples is /content/transformers/examples,
Sorry I'm a complete noob when it comes to python
Edit:
I think I fixed it by giving the path to finetune.py in finetune.sh :
python /content/transformers/examples/seq2seq/finetune.py \
--learning_rate=3e-5 \
--fp16 \
--gpus 1 \
--do_train \
--do_predict \
--n_val 1000 \
--val_check_interval 0.1 \
--sortish_sampler \
$@
However, I got a new error:
File "/content/transformers/examples/seq2seq/finetune.sh", line 7
--gpus 1 \
is it normal?
^
@Hildweig make sure your lightning example is up to date with examples/requirements.txt. Closing this. Pls make a new issue if you are still struggling to get things working :)