Spacy: Keras Entailment Example

Created on 23 Jan 2017 · 18Comments · Source: explosion/spaCy

I had a few doubts/questions about the Keras Entailment example

Why do we have to pass the model_dir? And if we don't pass it, shouldn't there be a default path? Also, it's not being used in the train method at all - or is it? In the other methods it's only loading from the directory passed, but I don't think the train method (or any other method) is storing it anywhere - or am I missing something here?
In the readme it asks for the directory for the training/evaluation, but what it needs is the path to the json file. Unless I've misunderstood something - it would be better to have this clarified on the readme (explicitly mentioning it needs a path to the son file).
If you, for example, run either demo, or evaluate, which need less parameters, it still throws up the : error: too few arguments message.
This could be because I've again, missed something somewhere - but when I run either evaluate or demo, I get this - NameError: global name 'SimilarityModel' is not defined from the spacy_hook file. What exactly is supposed to be happening here? The training works fine, and so does the pytest.

[btw- the pytest sometimes fails if you're using a virtualenv, where you'd have to re-install pytest]

I wouldn't mind opening a PR to make the documentation changes (if you think they would be helpful/necessary).

I'm in the process of making a jupyter notebook which tries to walkthrough all of this, because I think the information in the readme isn't detailed enough about how to exactly get this to run. Also wanted demonstrate it (with some results).

Your Environment

Operating System: OS X El Capitan
Python Version Used: 2.7.12
spaCy Version Used: 1.6.0
Environment Information:

examples

Source

bhargavvader

Most helpful comment

I encountered this same problem and fixed it by selectively restoring part of an older version of keras. This commit:

https://github.com/fchollet/keras/commit/6417d90d5c1f70844d8d346312f1b40f449545a5#diff-56dc3cc42e1732fdb3a3c2c3c8efa32a

and this one:

https://github.com/fchollet/keras/commit/570fdf31c5cb9a580496d1d93320bc7ab1b9ad46#diff-56dc3cc42e1732fdb3a3c2c3c8efa32a

Introduces some code into keras/utils/generic_utils.py to allow for saving and re-loading of closures, but then this one:

https://github.com/fchollet/keras/commit/edae1785327dd7a418ac06c2fe85a8c1f6ea05b7#diff-56dc3cc42e1732fdb3a3c2c3c8efa32a

Removes the function that restores the closures. The comments on this commit claim that this was "broken" code, but it worked fine for me.

I just made sure that the functions related to closures were as in https://github.com/fchollet/keras/commit/570fdf31c5cb9a580496d1d93320bc7ab1b9ad46#diff-56dc3cc42e1732fdb3a3c2c3c8efa32a and this part worked.

Since this is obviously a bit unstable in keras right now, it would be better to see if this can be redone without closures.

(Also note that by default the example trains a model with max_length = 100 but uses max_length = 10 when running demo or evaluate. You'll have to change one of these to make it work).

jfoster17 on 9 Feb 2017

👍4

All 18 comments

Hey,

Thanks for your attention on this. I agree that the example is a bit messy at the moment, and would really appreciate the PR for the docs changes, and any general tidying you want/need to do when making the notebook.

I think a notebook for this will be really great, because I've done a bit of hacking on different model options, and it's hard to explain them in the current format. A notebook is really a better solution.

Matt

honnibal on 23 Jan 2017

Yup, I'm on this now.

Still need some help with questions 1 & 4 though - what's the purpose of passing the model directory? I don't see the model_dir parameter being used in the train method at all. Where is the keras model being tied to the pipeline?

Also when I run evaluate, it stops at the create_similarity_pipeline method because SimilarityModel isn't defined. Or am I supposed to make the model, the way it's described in the docs? I was unsure because I thought the example was ready to run, with the way the code is right now.

bhargavvader on 31 Jan 2017

Hey,

I've actually just been updating this --- let me push some state I have in my working directory.

honnibal on 31 Jan 2017

Updated. Btw, be sure to use Theano with this — for some reason I can't get it to work on Tensorflow...

honnibal on 31 Jan 2017

Also, just a note: it sure looks to me like the normalization done here is incorrect, because it's computed without reference to the mask. You'll get probability mass in the attention going to elements that are actually masked, so you won't have a proper distribution.

matt-gardner on 8 Feb 2017

@honnibal , have you got either demo or evaluate to work on your machines? While the training is fine when I try to use it (theano backend), I end up getting the following error:

  File "/Users/bhargavvader/Open_Source/spacy-notebooks/venv/lib/python2.7/site-packages/spacy/language.py", line 301, in __init__
    self.pipeline = overrides['create_pipeline'](self)
  File "keras_parikh_entailment/spacy_hook.py", line 88, in create_similarity_pipeline
    KerasSimilarityShim.load(nlp.path / 'similarity', nlp, max_length=10)
  File "keras_parikh_entailment/spacy_hook.py", line 19, in load
    model = model_from_json(file_.read())
  File "/Users/bhargavvader/Open_Source/spacy-notebooks/venv/lib/python2.7/site-packages/Keras-1.2.1-py2.7.egg/keras/models.py", line 213, in model_from_json
    return layer_from_config(config, custom_objects=custom_objects)
  File "/Users/bhargavvader/Open_Source/spacy-notebooks/venv/lib/python2.7/site-packages/Keras-1.2.1-py2.7.egg/keras/utils/layer_utils.py", line 41, in layer_from_config
    custom_objects=custom_objects)
  File "/Users/bhargavvader/Open_Source/spacy-notebooks/venv/lib/python2.7/site-packages/Keras-1.2.1-py2.7.egg/keras/engine/topology.py", line 2582, in from_config
    process_layer(layer_data)
  File "/Users/bhargavvader/Open_Source/spacy-notebooks/venv/lib/python2.7/site-packages/Keras-1.2.1-py2.7.egg/keras/engine/topology.py", line 2560, in process_layer
    custom_objects=custom_objects)
  File "/Users/bhargavvader/Open_Source/spacy-notebooks/venv/lib/python2.7/site-packages/Keras-1.2.1-py2.7.egg/keras/utils/layer_utils.py", line 41, in layer_from_config
    custom_objects=custom_objects)
  File "/Users/bhargavvader/Open_Source/spacy-notebooks/venv/lib/python2.7/site-packages/Keras-1.2.1-py2.7.egg/keras/layers/core.py", line 681, in from_config
    function = func_load(config['function'], globs=globs)
  File "/Users/bhargavvader/Open_Source/spacy-notebooks/venv/lib/python2.7/site-packages/Keras-1.2.1-py2.7.egg/keras/utils/generic_utils.py", line 100, in func_load
    closure=closure)
TypeError: arg 5 (closure) must be None or tuple

This is similar to this error raised with keras.

bhargavvader on 9 Feb 2017

@matt-gardner : I've struggled with this type of issue a lot actually. Do you have a good answer? So far I haven't found a better way to implement this in Keras.

I think the sequence handling in Keras is really broken, in general. The masking is very buggy and inconsistent, and even when you convince Keras to pass the mask layer forward for you, it's still ver difficult to make the model correct.

@bhargavvader Can confirm that this is broken for me now too :(. I'm not sure whether Keras changed, or whether it's due to code changes I introduced.

honnibal on 9 Feb 2017

I encountered this same problem and fixed it by selectively restoring part of an older version of keras. This commit:

https://github.com/fchollet/keras/commit/6417d90d5c1f70844d8d346312f1b40f449545a5#diff-56dc3cc42e1732fdb3a3c2c3c8efa32a

and this one:

https://github.com/fchollet/keras/commit/570fdf31c5cb9a580496d1d93320bc7ab1b9ad46#diff-56dc3cc42e1732fdb3a3c2c3c8efa32a

Introduces some code into keras/utils/generic_utils.py to allow for saving and re-loading of closures, but then this one:

https://github.com/fchollet/keras/commit/edae1785327dd7a418ac06c2fe85a8c1f6ea05b7#diff-56dc3cc42e1732fdb3a3c2c3c8efa32a

Removes the function that restores the closures. The comments on this commit claim that this was "broken" code, but it worked fine for me.

Since this is obviously a bit unstable in keras right now, it would be better to see if this can be redone without closures.

(Also note that by default the example trains a model with max_length = 100 but uses max_length = 10 when running demo or evaluate. You'll have to change one of these to make it work).

jfoster17 on 9 Feb 2017

👍4

@jfoster17 Thanks!

Okay, I think it's best to avoid the json serialisation. This makes sense to me, really, especially since we have our own attributes that we're trying to pass around (sorry that the max_length=10 hack made it into master! I was hacking at this...)

So, we should write out a config.json that gives us the necessary hyper-params to make another call to build_model when loading the data. I think this is the way the sentiment analysis example in deep_learning_keras.py does this.

honnibal on 9 Feb 2017

@honnibal we're building a library that tries to make NLP easier with Keras; you can see what we did for attention layers here. It took a lot of work and a lot of tests to make sure masking is done properly throughout, but we're reasonably confident that it does the right thing now. The library is close to ready for public consumption, but not quite there yet, so you'll probably notice some inconsistencies and issues still if you poke around the code.

matt-gardner on 9 Feb 2017

@matt-gardner Interesting!

I think it's valuable to work within Keras this way, especially now that it's been appointed the official Tensorflow front-end. But I have to say, I think the whole masking idea is just bad, tbh.

I think it's much better to maintain an array of the sequence lengths. For this type of model, you're then able to concatenate all the inputs into a single matrix without any padding. For LSTM models, you can sort the batch by length, and then drop the short rows as they're completed.

I don't have the LSTMs implemented yet, but you can see the pooling at work in this example: https://github.com/explosion/thinc/blob/master/examples/quora_similarity.py

The code is quite different from Keras though, because it's not based on Tensorflow etc...It's just based on numpy/cupy. The flatten_with_lengths operation is here: https://github.com/explosion/thinc/blob/master/thinc/api.py#L44

honnibal on 9 Feb 2017

@honnibal I got the example running on tensorflow and applied @jfoster17 fix to resolve the error with closures. Regarding tensorflow I made the following changes to keras_decomposable_attention.py and __main__.py:

included the keras backend
import keras.backend as K
Added a flag to import tensorflow

USE_TF = True
if USE_TF:
    import tensorflow as tf

To resolve precondition errors due to uninitialized variables, I created a tensorflow session, initialized model variables using the session, and assigned the keras session to it. In particular the following code block was added to test_fit_model of keras_decomposable_attention.py and train' of__main__.pydirectly before the call tomodel.fit`:

if USE_TF:
        sess = tf.Session()
        init = tf.global_variables_initializer()
        sess.run(init)
        K.set_session(sess)

Lastly, I added the following as the last block of code in the functions above:

if USE_TF:
        sess.close()

Hope this helps! Oh I got this running using the following environment:

Operating System: Windows Server 2012 (16 cores no GPU)
Python: Anaconda python 3.5

On another note, @honnibal and @matt-gardner, what affect does improper probability distribution have on model performance?

enigmoization on 12 Feb 2017

❤1

@enigmoization Thanks!! Are you able to make a pull request with the fixes?

The messed up attention weights might have a big impact if the length cap is relaxed, which does seem to improve accuracy.

honnibal on 13 Feb 2017

@honnibal you're welcome and thanks for the explanation about the attention weights. Regarding a pull request, I haven't tried but can look into making one this weekend.

enigmoization on 13 Feb 2017

I made all changes which enigmoization suggested and run following command:

$python keras_parikh_entailment/ demo snli_1.0/snli_1.0_train.jsonl snli_1.0/snli_1.0_dev.jsonl

But it is giving following error:
Using TensorFlow backend.
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/Users/saurabh/Desktop/ai/tools/keras/keras_parikh_entailment/__main__.py", line 155, in
plac.call(main)
File "/Users/saurabh/Desktop/ai/tools/keras/keras_vir/lib/python2.7/site-packages/plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "/Users/saurabh/Desktop/ai/tools/keras/keras_vir/lib/python2.7/site-packages/plac_core.py", line 207, in consume
return cmd, self.func((args + varargs + extraopts), kwargs)
File "/Users/saurabh/Desktop/ai/tools/keras/keras_parikh_entailment/__main__.py", line 152, in main
demo()
File "/Users/saurabh/Desktop/ai/tools/keras/keras_parikh_entailment/__main__.py", line 89, in demo
create_pipeline=create_similarity_pipeline)
File "/Users/saurabh/Desktop/ai/tools/keras/keras_vir/lib/python2.7/site-packages/spacy/__init__.py", line 42, in load
return cls(overrides)
File "/Users/saurabh/Desktop/ai/tools/keras/keras_vir/lib/python2.7/site-packages/spacy/en/__init__.py", line 34, in __init__
Language.__init__(self, *overrides)
File "/Users/saurabh/Desktop/ai/tools/keras/keras_vir/lib/python2.7/site-packages/spacy/language.py", line 297, in __init__
self.pipeline = overrides'create_pipeline'
File "keras_parikh_entailment/spacy_hook.py", line 88, in create_similarity_pipeline
KerasSimilarityShim.load(nlp.path / 'similarity', nlp, max_length)
File "keras_parikh_entailment/spacy_hook.py", line 19, in load
model = model_from_json(file_.read())
File "/Users/saurabh/Desktop/ai/tools/keras/keras_vir/lib/python2.7/site-packages/keras/models.py", line 345, in model_from_json
return layer_module.deserialize(config, custom_objects=custom_objects)
File "/Users/saurabh/Desktop/ai/tools/keras/keras_vir/lib/python2.7/site-packages/keras/layers/__init__.py", line 54, in deserialize
printable_module_name='layer')
File "/Users/saurabh/Desktop/ai/tools/keras/keras_vir/lib/python2.7/site-packages/keras/utils/generic_utils.py", line 139, in deserialize_keras_object
list(custom_objects.items())))
File "/Users/saurabh/Desktop/ai/tools/keras/keras_vir/lib/python2.7/site-packages/keras/engine/topology.py", line 2487, in from_config
process_layer(layer_data)
File "/Users/saurabh/Desktop/ai/tools/keras/keras_vir/lib/python2.7/site-packages/keras/engine/topology.py", line 2473, in process_layer
custom_objects=custom_objects)
File "/Users/saurabh/Desktop/ai/tools/keras/keras_vir/lib/python2.7/site-packages/keras/layers/__init__.py", line 54, in deserialize
printable_module_name='layer')
File "/Users/saurabh/Desktop/ai/tools/keras/keras_vir/lib/python2.7/site-packages/keras/utils/generic_utils.py", line 139, in deserialize_keras_object
list(custom_objects.items())))
File "/Users/saurabh/Desktop/ai/tools/keras/keras_vir/lib/python2.7/site-packages/keras/layers/core.py", line 697, in from_config
function = func_load(config['function'], globs=globs)
File "/Users/saurabh/Desktop/ai/tools/keras/keras_vir/lib/python2.7/site-packages/keras/utils/generic_utils.py", line 206, in func_load
closure=closure)
TypeError: arg 5 (closure) must be None or tuple