I am able to compile the stable (0.1.0) version of the code on a powerpc64le (IBM Minsky) without any errors/warnings. However when I run on any dataset (eg stackexchange cooking) using just the defaults ./fasttext supervised -input ... -output ... the program just hangs after displaying Reading ... words. I tried make debug as well. Same problem. (details: make 4.1, Ubuntu 16.04.3 LTS. Any ideas?
@ironv Which compiler are you using?
I have tried both with c++ (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609 and g++ (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609. Same result.
I imagine to debug this, much more information would be helpful:
fasttext on this ppc64le arch. We have it running _as advertised_ on x86_64 boxes.v0.1.0.zip has make debug.@ThinkOpenly to your second point (unfamiliarity with fasttext)...this takes a couple of mins:
wget https://github.com/facebookresearch/fastText/archive/v0.1.0.zip
unzip v0.1.0.zip
cd fastText-0.1.0
make
wget https://s3-us-west-1.amazonaws.com/fasttext-vectors/cooking.stackexchange.tar.gz && tar xvzf cooking.stackexchange.tar.gz
head -n 12404 cooking.stackexchange.txt > cooking.train
tail -n 3000 cooking.stackexchange.txt > cooking.valid
./fasttext supervised -input cooking.train -output model_cooking
Output
Read 0M words
Number of words: 14598
Number of labels: 734
Progress: 100.0% words/sec/thread: 75109 lr: 0.000000 loss: 5.708354 eta: 0h0m
I get NO output at all. It just hangs.
The README indicates there is a "verbose" option, possibly with an (optional?) verbosity level.
The following arguments are optional:
-verbose verbosity level [2]
...could be worth a try at least. :-)
The default is 2 (as shown above). That produces the most output. I can set it to verbose 0 when there will be no output to screen. So, when I don't specify it (like in the example above), I should get all output to screen (model and word vectors are always saved to a file).
Oh, I thought that meant the default was 2 _if_ no level was specified, as in "-verbose". So, it hangs quite early it seems, before any output at all. GDB tracebacks will likely be helpful, then.
gdb has not been installed yet (I don't have sudo on the box). Using good ole' std::cout, I have narrowed it to the while loop between lines 223 and 232 in dictionary.cc.
while (readWord(in, word)) {
add(word);
if (ntokens_ % 1000000 == 0 && args_->verbose > 1) {
std::cerr << "\rRead " << ntokens_ / 1000000 << "M words" << std::flush;
}
if (size_ > 0.75 * MAX_VOCAB_SIZE) {
minThreshold++;
threshold(minThreshold, minThreshold);
}
}
It is not coming out of this loop.
that's good information. is it stuck in "readWord"? One thing that comes to mind for a difference between x86 and POWER is that the default signedness for "char" types is signed on x86 and unsigned on POWER. I see the use of "char" type in readWord. If EOF is -1, then you might need a "signed char" there instead. (I'm not sure what "sbumpc()" returns.) Anyway, hopefully that area is a good place to look deeper.
That was it!!! Changed line 195 in dictionary.cc from char c to signed char c. Have tested it on a few different files now and it works. Thank you @ThinkOpenly
@ironv I'd leave this open so the facebook devs can change that one line, which is a platform agnostic fix, and hopefully no other POWER (or ARM) users will hit this error.
re-opening as suggested by @grooverdan
Hi @ironv, @ThinkOpenly, @grooverdan,
Thank you for reporting and solving this issue. This should be fixed now.
Best,
Edouard.
thanks @EdouardGrave.
Most helpful comment
that's good information. is it stuck in "readWord"? One thing that comes to mind for a difference between x86 and POWER is that the default signedness for "char" types is signed on x86 and unsigned on POWER. I see the use of "char" type in readWord. If EOF is -1, then you might need a "signed char" there instead. (I'm not sure what "sbumpc()" returns.) Anyway, hopefully that area is a good place to look deeper.