Keras: Spring 2017 roadmap: Keras 2, PR freeze, TF integration

Created on 7 Feb 2017 · 46Comments · Source: keras-team/keras

Hi all,

Some news.

PR freeze

We are preparing the release of Keras 2, as well as the integration of the Keras API directly into the TensorFlow repository. Subsequently, we are declaring a PR freeze on Keras, to be lifted after the release of Keras 2. This means that no further PR to Keras 1 will be merged (or even reviewed). However, PRs to the Keras 2 branch (when it becomes available) are welcome.

Keras 2

We plan on making available a Keras 2 branch in the next few days, with a final release in the next few weeks.

Keras 2 will consist in some refactoring, a lot of API changes, and few functionality changes. There are many places in which the Keras 1 API was not optimal, differed from industry standards such as those set by TensorFlow or Numpy, or could otherwise be improved. We bundle API changes in a single release, so that users will only have to update their code once and for all.

API changes between Keras 1 and Keras 2 will be made backwards compatible as much as possible, i.e. your Keras 1 code should still run with Keras 2. The Keras 1 API will be deprecated, and Keras 1 code running with Keras 2 will output deprecation warnings that will instruct users on how to update their code, line by line. Note that backwards compatibility will not be total, and advanced users (e.g. people who write their own layers) may see their code break.
We will release complete notes covering all changes made and how to update a Keras 1 codebase to Keras 2.
API changes after Keras 2 will be rare and limited in impact (the goal is have almost none). Keras 2 is a "long-term support" API, the first in Keras. Codebases written in Keras 2 next month should still run many years from now, on up-to-date software.
In the medium term, we will write down the Keras API as the "Keras spec", and we will set up a "Keras committee" to overview changes to the Keras spec. Indeed, Keras is no longer a library, but rather a spec with different available implementations. Changes to this spec need to be centralized (before being replicated across all implementations) and trusted to an authority that will carefully review all proposed changes. This also ensures that there will be few changes and that all changes will have a strong rationale.
New, bleeding-edge functionality should preferably go to Keras contrib.

TF integration

The Keras 2 API will become part of the TensorFlow repository, to serve as a high-level API for TensorFlow. Concretely:

We are bringing a TF-only, independent implementation of the Keras spec into TF, first in tf.contrib, later in tf.keras.
This implementation will increasingly be based off of core TF primitives (e.g. TF core layers and Keras layers will be the same objects), making code built using tf.keras deeply compatible with other TF functionality. You will be able to mix and match core TF and tf.keras functionality seamlessly (in effect, tf.keras is just a TF API, not a separate library). Likewise, you should be able to use Keras models with e.g. TF Experiments, allowing you to easily train a Keras model in a distributed setting or on CloudML, or do distributed hyperparameter search. By using tf.keras, you will benefit from the full power of TensorFlow.
This integration does not affect the repository fchollet/keras. It continues to be the "home" of Keras, and Theano support will continue indefinitely. We are not replacing what is already there, rather, we are simply adopting the Keras spec as a built-in high-level API for TF.
Additionally, Microsoft is building a CNTK backend for Keras. In general, you should expect support for more backends in the future, not less. The goal is to have the Keras spec serve as a cross-platform front-end layer for deep learning, allowing compatibility of codebases and saved models across different backend engines. The more implementations the merrier.

Announcement

Source

fchollet

👍100 🎉10 😄7 ❤5

Most helpful comment

@fchollet here is a list of the masking requests I can think of right now. I might add more later:

The Embedding layer should work for higher-order inputs. Imagine a sentence represented as characters, for instance, and you want to embed each of the characters and then run a character-level encoder over each of the words. Your input would be (batch_size, num_words, num_characters_per_word). Embedding doesn't currently work correctly with this input. There are lots of similar situations where you have higher-order word or character input, and none of them work correctly without modifying Embedding.
TimeDistributed needs to pass the mask through to the layer that it wraps. Additionally, there should be several subclasses available for handling compute_mask in different ways. For example, imagine the sentence representation from above. If I want to TimeDistribute a CNN encoder, applying it to each of the words, the mask I want to compute is basically K.any on the mask for each timestep, so that the output mask tells me which whole words were masked. If I then want to take those word representations and pass them through a Highway layer, I need to TimeDistribute the Highway layer over the number of words, because the tensor is (batch_size, num_words, encoding_dim). In this case, I want TimeDistributed to just pass through the mask. In still other cases, I might want to pass the computation of compute_mask to the wrapped layer, and join them afterwards. It's possible that you could capture all three of these use cases with just the last one, but it would probably take some complex logic to do so, in addition to modifying the behavior of compute_mask in wrapped layers (e.g., LSTM doesn't currently return a mask at all in the return_sequences=False case, and it would need to return either a 0 or a 1 for this to work).
An equivalent to K.softmax that takes a mask as input is needed. Any time you want to compute a softmax over something that's padded, you need this. The most obvious use case is attentions over word sequences, but there are others, too. You could solve this by adding another backend function, or just by adding a Softmax layer that handles masking (which will in the end also need another backend function, or just its own code that uses backend functions).
The Lambda layer should support masking, as @braingineer said above.
Backend functions need to handle masks. For example, computing an attention is often done with something like K.batch_dot. If you want to implement bidirectional attention flow, you need to compute a similarity matrix that then gets passed through a couple of different softmaxes. As I already said above, the softmax needs to treat a mask correctly, so the operation that you did to compute the similarity matrix needs to propagate a correct mask (or you have to create one huge function, which prohibits re-using the similarity matrix in several downstream layers). So, we need a K.batch_dot that propagates a mask. Similar to what I said for K.softmax, you could either do this with another backend function, or you just add a BatchedDot layer that handles the mask correctly. In general, it seems useful to have layers associated with most backend functions that do the correct masking computation (this may not be necessary for all of them, especially if the Lambda layer supports masking and passes through the mask by default).
All layers should document their masking behavior (expected input shape and output shape, etc.), just like they document their input/output behavior.
Some high-level documentation about masking would be really nice (e.g., an "About masking" page), specifying how masking works in Keras, what a mask's dtype should be, how to get masks into your Model, and in what situations you might want to use a mask.
EDITED TO ADD: It'd be nice if you could consistently call K.int_shape() on masks. This is not the case in the theano backend.

We have solutions for a lot of these problems in our codebase that we can contribute, though it's all based on Keras 1.*, and I'm not sure how much will change in Keras 2. Either way, I'm happy to help contribute to fixing these issues. I would really like to see Keras succeed in being great for NLP.

matt-gardner on 13 Feb 2017

👍15 ❤11

All 46 comments

Exciting! Do you need any help with the porting?

New, bleeding-edge functionality should preferably go to Keras contrib.

It would be nice to have a rough criteria and perhaps a few examples on what should go to contrib and what should go to Keras proper.

Dapid on 7 Feb 2017

👍3

Will this PR freeze affect docstring improvements?

Also with the release of Keras 2 would it be a good idea to greatly reduce the number of tickets and implement a system/process that prevents or redirects general debugging questions to gitter, slack channel, or stackoverflow? From what I've seen most of the issues on this repo are implementation clarifications, general deep learning questions, and debugging help.

As for the keras spec when it is released, will there be a list of TODOs where the community can contribute? I'm very excited!

RamaneekGill on 7 Feb 2017

👍4

Any chance that masking can get first-class support in the Keras 2.0 spec? Building complex NLP models with Keras is difficult and bug-prone, because masking is not supported very well. We've had to write a lot of custom layers and override default Keras layers in order to get them to handle masking correctly.

matt-gardner on 7 Feb 2017

👍15

@matt-gardner That would be amazing. It's such a pain to build e.g. hierarchical models with masking right now.

bfelbo on 7 Feb 2017

@matt-gardner Did you make any pull requests to fix layers you've had issues with (and subsequently fixed)?

patyork on 7 Feb 2017

Yes, we've submitted some:

https://github.com/fchollet/keras/pull/3218
https://github.com/fchollet/keras/pull/4253
https://github.com/fchollet/keras/pull/4258

But getting no response after trying to submit improvements is pretty demoralizing for submitting future PRs, so we started just overriding Keras layers in our code (e.g., here, a really trivial fix to Highway that makes it work with masking, that wasn't included because masking is an afterthought in the current Keras API).

matt-gardner on 7 Feb 2017

👍4

Good news!

If there will be two github repositories, how would you sync pull requests to tf.keras and this repository? Will there be someone applying changes in one repositority to another?

minkooseo on 7 Feb 2017

If there will be two github repositories, how would you sync pull requests to tf.keras and this repository? Will there be someone applying changes in one repositority to another?

The codebases will be different, so there will be no need to replicate pull requests. For API changes, you would send a PR to the API spec itself, and changes to the API spec would be replicated across all codebases.

fchollet on 7 Feb 2017

@matt-gardner it looks like your issue is simply that some Keras layers do not support masking yet. Is that right?

fchollet on 7 Feb 2017

@fchollet it's just a plea to please take masking very seriously when thinking about the Keras 2.0 spec. It's crucial for complex NLP, and some pretty basic building blocks of NLP models in Keras don't support masking correctly (e.g., the Embedding layer, and the TimeDistributed layer, as pointed out in PRs I've already linked to). Additionally, almost none of the backend operations deal with masks. This is fine in some cases, but if you want to compute a softmax with a mask, for instance, you have to write your own code. This makes doing attentions over padded word sequences hard, and probably most implementations of attention in Keras are wrong because of this - if you apply the mask after a softmax, as done in this re-implementation of a popular paper, it's wrong, because your distribution wasn't normalized correctly, and it's not obvious that it's wrong from looking at the code.

There's also very little documentation about masking. It's in the background and easy to forget about. But you can't forget about it when doing NLP, or you're doing it wrong. It really needs to be treated as a fundamental component to any static computation graph applied to NLP tasks. The difficulty here is why people choose DyNet over Keras for NLP. There's a whole lot to like about Keras - it'd be nice if were also really good for NLP.

matt-gardner on 8 Feb 2017

👍24

https://github.com/farizrahman4u/keras-contrib is the Keras Contrib repository described above

ahundt on 8 Feb 2017

will you be creating a keras organization in github?

yassersouri on 8 Feb 2017

make graph visualization in TensorBoard great again, please! This is a feature request I honestly don't know how to solve myself. I find Keras to make the graph tab on tensorboard hard to read

EderSantana on 10 Feb 2017

👍4

Exciting! This is really a big news. I hope I can mak contributions to Keras 2!
BTW, here are some possible adjustment in Keras 2.

merge and Merge. It confuses users.
metrics, as I said in another issure, metrics are unnecessary to be a part of computation graph. Writting metrics with "tensors" is not a easy job.
validation_split and samples shuffle, the separation of training data and validation data should be conducted after the dataset get shuffled.
I also hope more details in training process are accessible in callbacks. It is very powerful.

Looking forward to the age of Keras 2!

MoyanZitto on 10 Feb 2017

👍2

I have personally felt that Keras leans more towards image stuff rather than nlp. I can't pin point why exactly I "feel" so; limited support for masking is definitely one of the factors..

farizrahman4u on 10 Feb 2017

👍1

@matt-gardner - I have most of the layers implemented with masking on a personal branch of keras and some custom layers like a LambdaMask layer and a couple variants of the TimeDistributed (which I call "Distribute" because I felt it fit the narrative better).

but I never had the time to write the proper tests to get things back into the main branch. And b/c of previous projects being already done in Theano, I'm just now diving into Tensorflow (thanks, Fold), so couldn't contribute anything towards that backend.

@fchollet some things I've been wanting to push in:

Passing callbacks should allow you to pass a custom progress bar. It currently silently adds this in the training step. It would be best if you could construct the callbacks object outside of the training function.
Masking in more layers, and the LambdaMask I've linked to above
An ability to pass in a names dictionary to the load_weights_by_name, so you can specify the subgraph(s) being loaded
**kwargs on the loss functions and other places, so we can pass extra information into them
Masks being passed into metrics and objective functions

Also, it would be nice if there was a way that you could attach meta information about masks into the graph to have it check for consistency. For example, I know I want certain shaped masks coming out and into certain layers. Right now, I have to go to a jupyter notebook, pull out all the shape tensors through keras' graph, and then evaluate on some data against expected sizes.

If you are looking into rewriting or fleshing out NLP data manip tools (Vocabulary management classes, making word embedding matrices from dataset and a word2vec/GloVe tensor, a data server mostly tuned to Keras's fit generator function). I have a suite of things I'm starting to push out of projects and into more full blown sharing.

Also, thanks for all of your (and everyone else's) awesome work =). I am really excited about this next iteration.

braingineer on 10 Feb 2017

I would also like to draw attention to the fact that building custom RNN architecture is next to impossible in Keras without nasty hacks. Please have a look at this discussion for details. Because of this reason people are forced to create repositories like RecurrentShop. It would be nice to have some official attention on making life easier for RNN researchers.

ParthaEth on 10 Feb 2017

👍1

In addition to the comments of @ParthaEth the same is true for reinforcement learning problems, loading images via tensorflow tensors #5356, and semantic segmentation seems to be second class. One example is keras-rl.

I don't expect Keras to handle every possible design and problem, but I think it is important to point out areas of weakness before LTS API scoping decisions are settled so the appropriate choices can be made explicitly.

ahundt on 10 Feb 2017

👍1

Thanks for the feedback; a lot of the points raised are in fact planned for Keras 2. Some are not, like support for timestep-wise control of RNNs. That can be added later.

With regard to masking, do you have a specific, point-by-point list of feature requests and use cases?

validation_split and samples shuffle, the separation of training data and validation data should be conducted after the dataset get shuffled.

No; in many cases this would lead to user errors, e.g. any case where the data is generated by a sequential process. I've seen many cases where this feature of Keras saved a user from validating on past data when they should been using only future data.

Passing callbacks should allow you to pass a custom progress bar. It currently silently adds this in the training step. It would be best if you could construct the callbacks object outside of the training function.

Nothing prevents you from setting verbose to 0 and passing your own progbar as a callback.

fchollet on 11 Feb 2017

Thanks for the responses @fchollet! Super appreciative of the work you're putting in!

Nothing prevents you from setting verbose to 0 and passing your own progbar as a callback.

That's true. Isn't there a trade off in this design decision, though? I'm not aware of any documentation stating that setting verbose to 0 has that singular effect (and if it does, then the name verbose doesn't seem precise enough, because it only does 1 thing) and tracing through the code as a sanity check doesn't sound all that fun. It would seem you could add flexibility without loss of ease of use by allowing a CallbackList to be passed into the fit functions and adding a None check here.

(As a side note and not a complete suggestion, it's easy to imagine Metrics and Objectives in this way, too. Allow for an object to be passed in if desired, default behavior if not. Messy logic for the respective bits could be folded in and hidden a bit more behind an easier to read API)

Thanks again for your work!

braingineer on 11 Feb 2017

I might be missing something, but in all of the docs for functions that accept verbose, the following description is present:

verbose: 0 for no logging to stdout, 1 for progress bar logging, 2 for one log line per epoch.

Also, both fit and fit_generator already take a parameter called callbacks:

callbacks: list of keras.callbacks.Callback instances. List of callbacks to apply during training. See callbacks.

patyork on 11 Feb 2017

verbose: 0 for no logging to stdout, 1 for progress bar logging, 2 for one log line per epoch.

Woops. Missed that part.

Also, both fit and fit_generator already take a parameter called callbacks:

It adds 2 callbacks without your control. Check my link.

braingineer on 11 Feb 2017

That's true. What do you propose fit/fit generator return, then, if there's no guarantee of a History object?

-----Original Message-----
From: "Brian McMahan" notifications@github.com
Sent: ‎2/‎10/‎2017 6:51 PM
To: "fchollet/keras" keras@noreply.github.com
Cc: "Pat York" pat.york@nevada.unr.edu; "Comment" comment@noreply.github.com
Subject: Re: [fchollet/keras] Spring 2017 roadmap: Keras 2, PR freeze, TFintegration (#5299)

Thanks for the responses @fchollet! Super appreciative of the work you're putting in!
Nothing prevents you from setting verbose to 0 and passing your own progbar as a callback.
That's true. Isn't there a trade off in this design decision, though? I'm not aware of any documentation stating that setting verbose to 0 has that singular effect (and if it does, then the name verbose doesn't seem precise enough, because it only does 1 thing) and tracing through the code as a sanity check doesn't sound all that fun. It would seem you could add flexibility without loss of ease of use by allowing a CallbackList to be passed into the fit functions and adding a None check here.
(As a side note and not a complete suggestion, it's easy to imagine Metrics and Objectives in this way, too. Allow for an object to be passed in if desired, default behavior if not. Messy logic for the respective bits could be folded in and hidden a bit more behind an easier to read API)
Thanks again for your work!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

patyork on 11 Feb 2017

Will Keras2 support PyTorch as backend, in the future?

laubonghaudoi on 11 Feb 2017

Will Keras2 support PyTorch as backend, in the future?

No, there are no plans to support PyTorch. There is nothing to be gained in supporting every novelty framework that crops up every quarter. Our goal is to make deep learning accessible and useful to as many people as possible, and that goal is completely opposite to building up deep learning hipster cred.

fchollet on 11 Feb 2017

👎18 👍7 😄3

re: @patyork

That's true. What do you propose fit/fit generator return, then, if there's no guarantee of a History object?

If it were to go this way, I'd imagine something like:

def fit_whichever_version(foo, bar, baz, callbacks=None, ...):
    # stuff

    if callbacks is None:
        # standard history + whichever verbose flag
        callbacks = cbks.CallbackManager.default(verbose=verbose)
    elif not isinstance(callbacks, cbks.CallbackManager):
        # handle list of strings, list of Callback instances, etc.
        callbacks = cbks.CallbackManager.from_kwargs(callbacks, verbose=verbose)
    else:
        assert isinstance(callbacks, cbks.CallbackManager)  # verbose is redundant in this case

    # more stuff

    return callbacks

so, if history were in callbacks, it'd be returned. otherwise the thing returned is exactly what the user expects: the thing they passed in w/ observations of training info.

But in the end, it doesn't matter all that much. Anyone who cares (I like it) will just hack it in there, and people who don't will never notice.

braingineer on 11 Feb 2017

I don't understand the impetus for this at all, and I don't like the fact that it instantly changes the behaviour for anyone using custom callbacks (e.g. model.fit(....., verbose=1, callbacks=[ModelCheckpointer()]) now returns nothing, and has no progress bar). If the CPU cycles it takes for the BaseLogger and curating the History object is too much overhead for you, you probably shouldn't be using Keras with all of its other overhead anyways.

patyork on 11 Feb 2017

I don't like the fact that it instantly changes the behaviour for anyone using custom callbacks (e.g. model.fit(....., verbose=1, callbacks=[ModelCheckpointer()])

That was never really implied. There's nothing against having the CallbackManager, by default, enact a history and logger. In fact, I explicitly passed in the verbose in my example.

If the CPU cycles it takes for the BaseLogger and curating the History object is too much overhead for you, you probably shouldn't be using Keras with all of its other overhead anyways.

No need to get snide. It has nothing to do with CPU cycles.

braingineer on 11 Feb 2017

Nothing snide about it - just confused as to the impetus of removing the BaseLogger and History callbacks as defaults/always available. The only thing I could think of was a possible performance hit.

patyork on 11 Feb 2017

Sorry --- it came off a bit harsh. Probably a tone of voice thing.

I've written some progress bars that use tqdm, for example. I've never overridden the History callback. My example was a possible solution to "find a way to let the user be able to completely specify their callback env, but also make it transparently the same for anyone who doesn't care".

braingineer on 11 Feb 2017

Yeah, fair enough - several people have written better progress bars, esp for use with Jupyter notebooks.

My point here is that there isn't really a backwards compatible way to do the above - if we get rid of the 2 default callbacks (3 if you consider the verbose flag), anyone using a custom callbacks list will have their code change behavior (because callbacks will not be None, but it will also not include BaseLogger or a History callback). If we don't get rid of the two defaults, well, we've really made no change.

Kind of a moot point. We'll see where it stands once v2 is at a code-complete state, and we start thinking about backwards compatibility.

patyork on 11 Feb 2017

My point here is that there isn't really a backwards compatible way to do the above - if we get rid of the 2 default callbacks (3 if you consider the verbose flag), anyone using a custom callbacks list will have their code change behavior (because callbacks will not be None, but it will also not include BaseLogger or a History callback). If we don't get rid of the two defaults, well, we've really made no change.

(bolded for my emphasis.) I've said this already, but that was never really implied. There's nothing against having the CallbackManager, by default, enact a history and logger. The only time anything will break is if people were expecting history to be returned.

Kind of a moot point.

Agreed. why I backed off 3 posts ago.

braingineer on 11 Feb 2017

@fchollet here is a list of the masking requests I can think of right now. I might add more later:

The Embedding layer should work for higher-order inputs. Imagine a sentence represented as characters, for instance, and you want to embed each of the characters and then run a character-level encoder over each of the words. Your input would be (batch_size, num_words, num_characters_per_word). Embedding doesn't currently work correctly with this input. There are lots of similar situations where you have higher-order word or character input, and none of them work correctly without modifying Embedding.
TimeDistributed needs to pass the mask through to the layer that it wraps. Additionally, there should be several subclasses available for handling compute_mask in different ways. For example, imagine the sentence representation from above. If I want to TimeDistribute a CNN encoder, applying it to each of the words, the mask I want to compute is basically K.any on the mask for each timestep, so that the output mask tells me which whole words were masked. If I then want to take those word representations and pass them through a Highway layer, I need to TimeDistribute the Highway layer over the number of words, because the tensor is (batch_size, num_words, encoding_dim). In this case, I want TimeDistributed to just pass through the mask. In still other cases, I might want to pass the computation of compute_mask to the wrapped layer, and join them afterwards. It's possible that you could capture all three of these use cases with just the last one, but it would probably take some complex logic to do so, in addition to modifying the behavior of compute_mask in wrapped layers (e.g., LSTM doesn't currently return a mask at all in the return_sequences=False case, and it would need to return either a 0 or a 1 for this to work).
An equivalent to K.softmax that takes a mask as input is needed. Any time you want to compute a softmax over something that's padded, you need this. The most obvious use case is attentions over word sequences, but there are others, too. You could solve this by adding another backend function, or just by adding a Softmax layer that handles masking (which will in the end also need another backend function, or just its own code that uses backend functions).
The Lambda layer should support masking, as @braingineer said above.
Backend functions need to handle masks. For example, computing an attention is often done with something like K.batch_dot. If you want to implement bidirectional attention flow, you need to compute a similarity matrix that then gets passed through a couple of different softmaxes. As I already said above, the softmax needs to treat a mask correctly, so the operation that you did to compute the similarity matrix needs to propagate a correct mask (or you have to create one huge function, which prohibits re-using the similarity matrix in several downstream layers). So, we need a K.batch_dot that propagates a mask. Similar to what I said for K.softmax, you could either do this with another backend function, or you just add a BatchedDot layer that handles the mask correctly. In general, it seems useful to have layers associated with most backend functions that do the correct masking computation (this may not be necessary for all of them, especially if the Lambda layer supports masking and passes through the mask by default).
All layers should document their masking behavior (expected input shape and output shape, etc.), just like they document their input/output behavior.
Some high-level documentation about masking would be really nice (e.g., an "About masking" page), specifying how masking works in Keras, what a mask's dtype should be, how to get masks into your Model, and in what situations you might want to use a mask.
EDITED TO ADD: It'd be nice if you could consistently call K.int_shape() on masks. This is not the case in the theano backend.

matt-gardner on 13 Feb 2017

👍15 ❤11

I am mainly looking forward to the fit_distributed() which could automatically use multiple GPUs promised by @fchollet months ago : )

pengpaiSH on 16 Feb 2017

👍4

An equivalent to K.softmax that takes a mask as input is needed

Feel free to add a Softmax layer to core. Also add a warning when people are using an Activation layer set to softmax with a mask being passed.

The Embedding layer should work for higher-order inputs.

That's planned.

TimeDistributed needs to pass the mask through to the layer that it wraps.

That's planned.

The Lambda layer should support masking

That's planned.

It'd be nice if you could consistently call K.int_shape() on masks. This is not the case in the theano backend.

This will always not be the case with Theano. Having full offline shape inference capability with Theano would a pretty big amount of work, and out-of-scope for Keras (since it should actually be a Theano feature).

Backend functions need to handle masks.

Masking is Keras-level, not backend-level. That won't change.

Some high-level documentation about masking would be really nice

Sure. Maybe a FAQ entry. If you have any initial proposal, feel free to submit a PR.

All layers should document their masking behavior (expected input shape and output shape, etc.), just like they document their input/output behavior.

Easy to add. After the documentation is updated for Keras 2, feel free to submit a PR.

fit_distributed()

Won't do. Distributed training will be handled via tf.Experiment, which you will be able to instantiate from a Keras model.

fchollet on 17 Feb 2017

@fchollet Thank you for your feedback and comments. Since lots of people in Keras are discussing Multi-GPUs utilization, could you please say a fewer more words about tf.Experiment?

pengpaiSH on 17 Feb 2017

My only refactor for Keras2 would be around the dim_ordering stuff. Some ideas:

I'd like the orderings to be renamed. Perhaps NHWC and NCHW as TensorFlow refers to them, or 'channel_first'/'channel_last'.
- 'tf' and 'th' just lead to confusion. Unbenounced to people that don't read the code, the backends simply adjust the inputs to their corresponding ordering via a dimshuffle. People tend to think that changing the dim_ordering can affect speed greatly - this is especially true for TF users who've read the above link and think that setting their ordering to 'th' will boost their speed on NVIDIA GPUs with cuDNN.
- It's true that performance will differ between backend+dim_ordering pairings (due to dimshuffling) but..:
If possible, I'd also like to see the transformation between orderings happen at a higher level. At each conv layer (when the ordering doesn't match the backend), the input is shuffled, acted upon, and then shuffled back to the original order.
- If this could happen once at the Model Input and then once at the Model Output, the internal code would be so much cleaner.
Speaking of Tensorflow, I think we should default to 'th' ordering for Tensorflow as well. This is actually what TF recommends for speed on NVIDIA GPUs (as that ordering is faster in cuDNN). 'tf' ordering is faster on the CPU, which is why that is the default ordering, but they recommend what we call 'th' ordering at the above link.
- Alternatively, perhaps a config item or smart-detection that can determine if TF is configured to run on an NVIDIA GPU, and at the very least recommending 'tf' ordering.

Edited to add bullet 3 and add some more thoughts. Apologies if it's rambling - hopefully you get the gist.

patyork on 17 Feb 2017

Hi @fchollet, I've just written a prototype for Keras using a Deeplearning4j backend. After completing this experiment, I've learned a lot about the design of Keras and pluggability of the framework.

Since a rewrite is already on the table, I am wondering if there are plans to make the backend more modular? In other words, do you have plans for a backend to handle more of the actual execution and give more granular control?

For example, Deeplearning4j runs in the JVM and bridges with the ND4J binary. In some cases, it is more advantageous and performant for DL4J to directly handle most of what happens for a fit() or evaluate() operation. This is partly to avoid creating dual references in Python and the JVM (using py4j to bridge the two environments).

The idea is that Keras is a more advanced "string argument" generator that creates a standard for model config and instructing the backend on what to execute. The DL4J experiment has already done this at a core level, and I believe there are some performance gains to be made.

crockpotveggies on 21 Feb 2017

👍2

FYI if you want to check out the experiment: https://github.com/crockpotveggies/dl4j-examples/tree/keras-examples/dl4j-keras-examples

Thanks to @pkoperek for the ideas to organize and hijack the Keras methods to simply bridge it to the JVM.

crockpotveggies on 21 Feb 2017

Hi there.
Couple of questions:

@fchollet is there any guide describing the API changes, namely "what's changed", "what will be deprecated", "what will stay unchanged".. and so on?
If not, is there any plan to do so?
I would be happy to help/contribute to the documentation about this - imho it would really help the transition.
Is the keras-contrib repo mentioned and referenced in this conversation (by @farizrahman4u) the official one?
Is there any plan to integrate it as a Keras Branch/module once Keras 2.0 will be released?
I'm asking this also because I probably spotted a couple of cases in which Keras 1.X API have been used...

Cheers

leriomaggio on 7 Mar 2017

👍2

Is the keras-contrib repo mentioned and referenced in this conversation (by @farizrahman4u) the official one?

Yes, it will be moved to Keras organization in the future. If any of the code breaks when Keras 2 is launched, it will be fixed by the maintainers. Else, each of the source files will be converted to the latest API passively.

farizrahman4u on 9 Mar 2017

👍2

@farizrahman4u Is Keras 2 ready now?

pengpaiSH on 10 Mar 2017

👍3

https://blog.keras.io/introducing-keras-2.html

singlas on 15 Mar 2017

I have another general plea. If Keras 2 will become part of TF, can we please have a replication of the TF layers as keras 2 ones.

For instance, it has been months before any attention was given to #4457 (Deconvolution3D/Conv3DTranspose), albeit it being part of the layers supported by TF for a while (and used by anyone doing any 3D networks). Or somehow feature replicate any of the layers that are 1 and 2D (which by looking at the documentation is effectively the only such layer lacking).

23pointsNorth on 22 Mar 2017

👍3

Concretely, if I want to use the Tensorflow backend, is it better to import keras ("Keras is no longer a library, but rather a spec"), import tensorflow.contrib.keras (marked as deprecated but still in the docs) or import tensorflow.keras (not documented)? Confused 😕

olalonde on 24 Oct 2017

👍1

Unless you have a specific reason to use tf.keras (e.g. a workflow that
is mostly TF and where you do not wish to involve 3rd-party libraries),
then use Keras. Else, use tf.keras (available as of TF 1.4).

On 23 October 2017 at 15:53, Oli Lalonde notifications@github.com wrote:

Concretely, if I want to use the Tensorflow backend, is it better to import
keras ("Keras is no longer a library, but rather a spec"), import
tensorflow.contrib.keras (marked as deprecated
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/keras
but still in the docs
https://www.tensorflow.org/api_docs/python/tf/contrib/keras) or import
tensorflow.keras (not documented)? Confused 😕

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/fchollet/keras/issues/5299#issuecomment-338819852,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AArWbzxFsULM-LPJrY5yUCRlA8udFVn1ks5svRjagaJpZM4L4336
.