Keras: Contrib Repo

Created on 6 Jan 2017 · 73Comments · Source: keras-team/keras

@fchollet Since we're not adding much to the repo at this stage (in terms of layers, loss functions, callbacks, etc.), we've talked quite a bit about an external repo for user/additional contributions, that, based on utility and usage may eventually make it into the core.

I'm curious about how you would want/like to go about this. An external repo is not a big deal, but I'm wondering:

How do we keep the chain of commits and contributions/contributors intact if or when we start pushing items into the core Keras from the external repo?
How would we go about determining if something is used/useful enough to move to Keras?

Just looking for thoughts on this topic. I've seen a repo or two that are not forks (completely external) and overwrite certain Keras files then do a setup.py install, but that seems like a poor way to do it. A fork would keep the chain of contributions in tact and would be easy to merge, but having any reference to a fork would just be confusing in my opinion.

stale

Source

patyork

Most helpful comment

I've discussed with some of the people responsible for tf.contrib, and they tend to agree that having a separate repo is much cleaner, and that having contrib part of the main repo is a source of issues. So that's what we should do.

So here are the specs for Keras contrib:

Repo name should be something like keras-contrib or keras-extras, nothing extravagant
Maintainers should be @patyork, @lukedeo, @the-moliver, @tboquet, @kemaswill, @bstriner, @farizrahman4u, @fchollet to start with (that's already a lot)
To become maintainer, you should be a recognized previous contributor to either contrib or core Keras
directory structure should mirror core Keras.
To make integration of specific objects into core easier, style guidelines should be the same as with Keras core (PEP8, docstring format) and that should be enforced via a Travis test.

Feel free to get started. Note that we would eventually move the repo to a Keras GitHub organization.

fchollet on 24 Jan 2017

👍5

All 73 comments

This is a good initiative.

How do we keep the chain of commits and contributions/contributors intact
if or when we start pushing items into the core Keras from the external
repo?

Git allows us to do pretty much anything. In particular we can reapply a
chain of commits with arbitrary authors. A contrib repo should live as an
independent repo from Keras (not a fork), and we can use custom git scripts
to transfer code from contrib to core Keras.

How would we go about determining if something is used/useful enough to
move to Keras?

Maybe by vote, as well as Github code searches.

Additionally: I would rather not have to manage a contrib repo myself,
since Keras keeps me busy enough already. We could have it under the
control of a small number of top Keras contributors.

On 6 January 2017 at 19:36, Pat York notifications@github.com wrote:

@fchollet https://github.com/fchollet Since we're not adding much to
the repo at this stage (in terms of layers, loss functions, callbacks,
etc.), we've talked quite a bit about an external repo for user/additional
contributions, that, based on utility and usage may eventually make it into
the core.

I'm curious about how you would want/like to go about this. An external
repo is not a big deal, but I'm wondering:

How do we keep the chain of commits and contributions/contributors
intact if or when we start pushing items into the core Keras from the
external repo?

How would we go about determining if something is used/useful enough
to move to Keras?

Just looking for thoughts on this topic. I've seen a repo or two that are
not forks (completely external) and overwrite certain Keras files then do a
setup.py install, but that seems like a poor way to do it. A fork would
keep the chain of contributions in tact and would be easy to merge, but
having any reference to a fork would just be confusing in my opinion.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/fchollet/keras/issues/4944, or mute the thread
https://github.com/notifications/unsubscribe-auth/AArWbyJoeEQr-9R5-FroUODNzUUu_fglks5rPomxgaJpZM4Lc-35
.

fchollet on 6 Jan 2017

I anticipated you not wanting to manage that repo as well, given that you are the gatekeeper of this fairly massive one already. I was looking to get the contrib repo up and running, but I want to be sure it's setup correctly (e.g. in a useful way that allows future merges into Keras core) before starting anything.

I've seen a fair amount of decent code/PRs declined in the last couple of months alone that would fit well as contribs, so I think this would be good to figure out a framework for ASAP.

Perhaps we can do a proof of concept along the lines of a new repo, adding what amounts to come contrib code, and determining what works best in getting it back into Keras (or a temp branch).

patyork on 6 Jan 2017

This is a great idea. I have a few things I could contribute and would be willing to help moderate.

the-moliver on 6 Jan 2017

So would it be a kinda beta release of keras new components?

keunwoochoi on 7 Jan 2017

Cool idea!

tboquet on 7 Jan 2017

I wouldn't call it solely a beta for Keras, although that could definitely be part of it. I think of it as an extensions library as well; that is, some of the less used items (loss functions, layers, etc) will be merged and accessible via installation and importing of the contrib library, but they may not be used enough to make sense in the Keras core library and would never make it in there.

So, an extension library where some of the extension make it back into the Keras core.

patyork on 7 Jan 2017

I could definitely help moderate.

lukedeo on 7 Jan 2017

Would be great to have a contrib repo up and running. There are a lot of trig functions to add to keras backend and it might be cleaner to have them in a separate repo. I've also been wanting to add functionality for multiplayer models (GANs, BiGANs, AAEs, etc.) which would be better to put in a contrib repo. It would be a great place for stable code we just don't want polluting Keras. Not everyone is going to need arccosh or whatever.

Would the repo have separate CI, documentation hosting, etc? Separate Google/slack groups or no?

bstriner on 7 Jan 2017

👍1

On that note, it would be nice for the contrib repo to have its own backend module (with the same logic and file structure as keras.backend) for any weird or esoteric backend functions.

bstriner on 7 Jan 2017

Might be nice to hook all this up as an extra so we can do pip install keras[contrib] and it would let us do something like from keras.contrib.layers import FancyUntestedLayer. We'd need some mods inside the keras setup for that, though. We can have just a keras_contrib PyPI package that can be standalone or patched into the main keras if installed as pip install keras[contrib].

I think we need separate CI / dox, though.

lukedeo on 8 Jan 2017

I think we should try to keep it entirely separate from Keras; simply a new repo that utilizes Keras for some things (backend functions, standard callbacks, etc), but that brings new functionality to the table.

An example would be

from keras.models import Sequential
from keras.layers import Dense, Activation
from keras_contrib.layers import ForgetfulLSTM #not a real type of layer, I don't think
# but with a mirrored directory structure, the extensions should be self explanatory

import keras.backend as K
import keras_contrib.backend as C # perhaps we can call this "K" and if the function is not in the contrib library
#    we can just fall back directly to the Keras backend? e.g.
# import keras_contrib.backend as K

C.arccos(data) # extended backend functions
K.mean(data) # standard functions still available

model = Sequential()
model.add(ForgetfulLSTM()) #layers compatible with standard Keras models
model.add(Dense())

I've been playing around with it a bit today; I'm still stuck on wrapping my head around what mechanisms are available for preserving commit history across dissimilar repos. Making a bit of progress, but without being able to get the correct history of the extension back into Keras, if/when the changes are promoted to keras core, the contrib repo is kind of moot - copy/pasting code doesn't appeal to me.

patyork on 8 Jan 2017

Watch ForgetfulLSTM make it to NIPS 2018.

Something like this might be relevant. Involves a filtered view of one repository then pulling updates into the other.

http://gbayer.com/development/moving-files-from-one-git-repository-to-another-preserving-history/

bstriner on 8 Jan 2017

If contrib is a large hodgepodge of things, maybe it makes sense to divide contrib up into extras.

Especially if we roll some specialized dependencies into contrib. For example, I wrote keras integration with tqdm which makes really nicely formatted progress bars in Jupyter. We could have a contrib extra that depends on progressbar, a contrib extra that depends on tqdm, etc.

We don't want to add any dependencies to keras to keep things lean but there could be contrib extras with various additional dependencies.

bstriner on 8 Jan 2017

I thought a bit about submodules, subtrees, etc, but I think it would be better off to just be a separate repo (such that, when needed, we can merge stuff into Keras core). So far, it seems like just a mirrored structure and a bit of manual reworking and applying of patches will get us to:

Keras completely separate and independent from the contrib
Contrib just "sitting on top of" Keras providing new features
Commits available to push into Keras at @fchollet's and the contrib team's approval/discretion
- History and contributor info intact
Separate documentation and CI build (but available to move to Keras as well)

I'd image the contrib repo being pretty lax in terms of requirements; just about anything can get merged, as long as it is valid code relating to deep learning, is "useful" to a wide audience (not super specific to a single model/dataset/person) and has tests written for it when called for. The really useful stuff would hopefully eventually move to Keras and be removed from the contrib repo, without much hassle.

patyork on 8 Jan 2017

Having played around with it a bit more, I think the separate repo is entirely viable. A quick test is below.

Some commits in the external repo (note: the repo is set up with similar structure/files as Keras, but this is not strictly necessary):

The highlighted commit above is put into a git patch via git format-patch. A little finagling of the patch file makes it compatible with Keras and available to be able to apply; these finagling changes are just to put the commit in the correct spot in the destination file. We take the patch to a new branch in Keras and apply with git am --signoff which keeps the commit message, original contributor, and has the sign-off note highlighted in blue:

At that point, the signee can reapply more commits in the same way; finally, just review and make any changes to the committed code, in a separate commit. These final fixes would be, perhaps, moving/removing/adding/editing import statements and their usage in the new code (e.g. elephas.layer.Name --> keras.layer.Name).

A lot of the manual stuff above could end up being handled automatically via simple search and replace; the patch finagling could be easier as well with a gui versus editing a diff file directly.

patyork on 9 Jan 2017

I don't think a contrib repo should mirror the code structure of core
Keras, and I don't understand why it should (it you feel strongly about it,
please explain what you reasons are). E.g. tf.contrib does not mirror core
tf, it's just a bunch of extras.

In the same way, I would see a Keras contrib repo as divided into modules
such as callbacks, layers, metrics, etc, with no redundancy between core
Keras code and contrib code.

On 9 January 2017 at 00:33, Pat York notifications@github.com wrote:

Having played around with it a bit more, I think the separate repo is
entirely viable. A quick test repo here
https://github.com/patyork/elephas/commits/master(elephas, meaning ivory
https://github.com/fchollet/keras#why-this-name-keras).

Some commits in the external repo (note: the repo is set up with similar
structure/files as Keras, but this is not strictly necessary):
[image: image]
https://cloud.githubusercontent.com/assets/1304633/21754495/82d8fb48-d5b6-11e6-8617-9059e6cfaf3c.png

The highlighted commit above is put into a git patch via git format-patch.
A little finagling of the patch file makes it compatible with Keras and
available to be able to apply; these finagling changes are just to put the
commit in the correct spot in the destination file. We take the patch to a
new branch in Keras and apply with git am --signoff which keeps the
commit message, original contributor, and has the sign-off note highlighted
in blue:
[image: image]
https://cloud.githubusercontent.com/assets/1304633/21754525/1b8209d4-d5b7-11e6-82fc-f650049e6174.png

At that point, the signee can reapply more commits in the same way;
finally, just review and make any changes to the committed code, in a
separate commit. These final fixes would be, perhaps,
moving/removing/adding/editing import statements and their usage in the new
code (e.g. elephas.layer.Name --> keras.layer.Name).

A lot of the manual stuff above could end up being handled automatically
via simple search and replace; the patch finagling could be easier as well
with a gui versus editing a diff file directly.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/fchollet/keras/issues/4944#issuecomment-271188622,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AArWb6ut1gDkDaBAqjV2PaPxJOX50XRvks5rQXI2gaJpZM4Lc-35
.

fchollet on 9 Jan 2017

I agree it should be divided up into modules, such as callbacks, metrics, layers, backend additions. The best way to do that is via a directory and file structure that Keras has; e.g.:

"repo name"/
- backend/
- theano_backend.py (additional ops for theano)
- tensorflow_backend.py (additional ops for tensorflow)
- layers/
- "additional layer types".py
- metrics.py (additional metrics code)

The above files, while named the same, contain no Keras code; perhaps they will have a from keras import x statement, but no copies of Keras code. Following the Keras directory and file structure is simply the neatest way to organize everything. An example:

from keras.layers import Dense
from elephas.layers import ForgetfulLSTM

Import structure follows a reasonable pattern between Keras versus the contrib repo.

patyork on 9 Jan 2017

👍1

@patyork if there are going to be more things being added by a larger contributor base it is going to be easier to manage with separate files. So, maybe metrics.py, regularizers.py, backend, etc. should be upgraded to modules/directories for the contrib repo? Add in an import * so we don't have conflicting commits in the init.py and everyone can safely works on their own metrics, etc.

More PRs would be adding files instead of editing files, so hopefully fewer conflicts and cleaner histories.

bstriner on 9 Jan 2017

Pat York: yes, that sounds perfectly reasonable.

On 9 January 2017 at 01:16, Ben notifications@github.com wrote:

@patyork https://github.com/patyork if there are going to be more
things being added by a larger contributor base it is going to be easier to
manage with separate files. So, maybe metrics.py, regularizers.py, backend,
etc. should be upgraded to modules/directories for the contrib repo? Add in
an import * so we don't have conflicting commits in the init.py and
everyone can safely works on their own metrics, etc.

More PRs would be adding files instead of editing files, so hopefully
fewer conflicts and cleaner histories.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/fchollet/keras/issues/4944#issuecomment-271192449,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AArWb9F7mRZD9Qmc5BMONQS3CTbYxjpsks5rQXxigaJpZM4Lc-35
.

fchollet on 9 Jan 2017

It's probably a good idea to choose another name to avoid confusion, Elephas is taken by this keras related project:
https://github.com/maxpumperla/elephas

the-moliver on 10 Jan 2017

Elephas is already taken indeed, and it's a quite popular project. How
about just keras_contrib, or keras_extras?

On 10 January 2017 at 02:00, Michael Oliver notifications@github.com
wrote:

It's probably a good idea to choose another name to avoid confusion,
Elephas is taken by this keras related project:
https://github.com/maxpumperla/elephas

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/fchollet/keras/issues/4944#issuecomment-271456231,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AArWb8bWvL0freVtLD8AYcErAAlexlDUks5rQthIgaJpZM4Lc-35
.

fchollet on 10 Jan 2017

Ah bummer - I thought that name would have been clever for an extension repo.

patyork on 10 Jan 2017

For a detailed and mature design of the sort being discussed here look to boost, the obvious tensorflow contrib, homebrew, and Qt. All are mature projects.

Also note splitting up will open up the latest successor to dll hell... versioning hell, as found in brew. OpenCV has done the same thing with their opencv_contrib repository, and as a few google searches can attest it seems to be very painful for users.

What about simply having a contrib folder in keras that is clearly marked experimental where things can be moved to the mainline folder as they are proven as done by tensorflow and some boost modules?

ahundt on 10 Jan 2017

👍1

As @fchollet suggested, opening up commit permission to a few trusted top contributors is likely a good plan to reduce his workload. Making them the owner of that contrib directory could be a good way to go about it.

ahundt on 10 Jan 2017

Is it possible to give commit rights specific to a directory on Github?
Last time I checked, commit rights were a binary: either you had
everything, or nothing.

On 10 January 2017 at 03:33, Andrew Hundt notifications@github.com wrote:

As @fchollet https://github.com/fchollet suggested, opening up commit
permission to a few trusted top contributors is likely a good plan to
reduce his workload. Making them the owner of that contrib directory could
be a good way to go about it.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/fchollet/keras/issues/4944#issuecomment-271469900,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AArWb90YMFPCquD4bSry4cmbjviPT4ZJks5rQu3kgaJpZM4Lc-35
.

fchollet on 10 Jan 2017

Yeah I guess this could mean trusting the people with access. Git hooks or the github status api could be another option.

ahundt on 10 Jan 2017

I think it's all or nothing - I've heard that gitlab has something like this, but I'm not sure. If we were to have other people as the "owner" of the contrib folder, we could just have @fchollet tag PRs as contrib worthy but not core-ready, and it would have to be an honor system thing (perhaps with guidelines, such as the removal of moderator status if violated) such that moderators do not merge / push directly to core.

lukedeo on 10 Jan 2017

as @lukedeo says with the honor system, there's nothing like a lightweight code of conduct for contributors (pull requests) and collaborators (commit access) to keep things in order. If it doesn't work out, revoking collaborator access is just a few clicks.

ahundt on 10 Jan 2017

That seems like a reasonable thing to do. Here's what we would need:

1) a document mapping contributor usernames to lists of folders they are
authorized to commit files to.
2) a code review guide to help collaborators perform consistent reviews and
vetting of contrib PRs.
3) a notification system were PRs made in contrib (or PRs tagged as
belonging in contrib) trigger an email to the contrib collaborators.

I can write 2). Any suggestions as to 3)? Also, anything I am missing?

Additionally: what is the current list of Keras contributors interested in
maintaining such a contrib folder?

On 10 January 2017 at 04:13, Andrew Hundt notifications@github.com wrote:

as @lukedeo https://github.com/lukedeo says with the honor system,
there's nothing like a lightweight code of conduct
https://github.com/Homebrew/brew/blob/master/CODEOFCONDUCT.md for
contributors (pull requests) and collaborators (commit access) to keep
things in order. If it doesn't work out, revoking collaborator access is
just a few clicks.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/fchollet/keras/issues/4944#issuecomment-271475167,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AArWbzoPk4n8cNpv7jmjIsWAtEOL4QUEks5rQvdQgaJpZM4Lc-35
.

fchollet on 10 Jan 2017

👍2

As for 3), I can take a stab at a simple polling system with PyGithub that would cost O($0.01) to run per month - I can start taking a look tonight. As for list of contributors, we could open an issue and people can 👍 if they want to be considered?

lukedeo on 10 Jan 2017

Great, thanks!

Currently I am counting: @patyork, @lukedeo, @the-moliver. Anyone else, please 👍 this post to be considered.

fchollet on 10 Jan 2017

👍4

I'm quite interested.

kemaswill on 10 Jan 2017

Just thinking ahead a bit, but if possible it would probably be good to have a separate TravisCI build for the contrib folder. Theoretically (and hopefully in practice) contrib items shouldn't break the Keras core, or even touch it, making those 15-20 minute tests redundant. I've never used TravisCI in depth, so I'm not sure if you have that fine of control over whether a certain PR calls test suite A versus suite B.

patyork on 10 Jan 2017

@patyork as tensorflow does, the pull requests can be compiled by travis before merges are permitted. For that reason I'd suggest merging via pull request with no direct commits be one element of the code of conduct. It gives the opportunity for code review and travis ci to run.

ahundt on 10 Jan 2017

@fchollet I've set up a system using AWS API-GW => AWS Lambda => AWS SES using a web-hook from GitHub that should run at ~0.40$/mo that'll be ready by tomorrow. Only remaining questions are:

Who should maintain this? I'm more than happy to make sure the system is humming along.
How do we want to add users? I've just assumed it's going to be low enough velocity that we can add users in by hand. (The email has a link for users to unsubscribe, though)

lukedeo on 11 Jan 2017

I envision that we would have a COLLABORATORS.txt document (or similar) in the Keras root that would describe a mapping between Github usernames (maybe email addresses) and directories that collaborators can commit to. This would be updated by hand, and could be used as input to automatically update any infrastructure we have around collaboration.

Thinking further about the contrib repo, there are still a couple reasons why we might want to keep it separate from Keras:

When making changes to contrib, we don't want to trigger a Travis build for all of Keras, as Pat York points out. Will there be any way to prevent this if contrib is a Keras subdirectory?
A TF-only version of Keras will be moved to the TF repo in the near future. This version should maintain the same API as the external (multi-backend) version. Of course it cannot include the contrib repo, so that would induce an API discrepancy: in external Keras you will be able to do from keras import contrib but this wouldn't work in TF-internal Keras.

I hope we can solve the first point. The second point may not be so important, but still something to keep in mind.

fchollet on 11 Jan 2017

@lukedeo you could definitely be in charge of maintaining the notification system (in my experience it's possible to leave a similar setup running and essentially never have to touch it in years). It would just need to have open-source deployment instructions so that anyone could take it over in the future.

fchollet on 11 Jan 2017

When making changes to contrib, we don't want to trigger a Travis build for all of Keras, as Pat York points out. Will there be any way to prevent this if contrib is a Keras subdirectory?

Sorry for missing this, but what is the downside to triggering a travis build for all of keras?

A TF-only version of Keras will be moved to the TF repo in the near future. This version should maintain the same API as the external (multi-backend) version.

Cool! Will it be in sync with the multi backend version? sorry for the digression

Of course it cannot include the contrib repo, so that would induce an API discrepancy: in external Keras you will be able to do from keras import contrib but this wouldn't work in TF-internal Keras.

Will the api be import tf.contrib.keras vs import keras.contrib or something similar to that? Would such a difference make for a reasonable indicator of what's available?

ahundt on 11 Jan 2017

On the Travis build, the downsides are fairly minor of kicking off a full build each time: It's long running time and just running unnecessary builds on Travis (which is free for the end users).

On the TF integration/move, I think he just means that it wont include Keras' contrib folder in the TensorFlow repo? Meaning if someone is using from TensorFlow directly, they wont have access to these contributions/extensions without cloning/downloading Keras itself? Whereas a separate repo for the Keras contributions would sit on top both Keras standalone and the TF version (as an optional install). I could be wrong though, this is the first I've read about a TF-Keras integration.

-----Original Message-----
From: "Andrew Hundt" notifications@github.com
Sent: ‎1/‎10/‎2017 8:18 PM
To: "fchollet/keras" keras@noreply.github.com
Cc: "Pat York" pat.york@nevada.unr.edu; "Mention" mention@noreply.github.com
Subject: Re: [fchollet/keras] Contrib Repo (#4944)

When making changes to contrib, we don't want to trigger a Travis build for all of Keras, as Pat York points out. Will there be any way to prevent this if contrib is a Keras subdirectory?
Sorry for missing this, but what is the downside to triggering a travis build for all of keras?
A TF-only version of Keras will be moved to the TF repo in the near future. This version should maintain the same API as the external (multi-backend) version.
Cool! Will it be in sync with the multi backend version? sorry for the digression
Of course it cannot include the contrib repo, so that would induce an API discrepancy: in external Keras you will be able to do from keras import contrib but this wouldn't work in TF-internal Keras.
Will the api be import tf.contrib.keras vs import keras.contrib or something similar to that? Would such a difference make for a reasonable indicator of what's available?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

patyork on 11 Jan 2017

On the Travis build, the downsides are fairly minor of kicking off a full build each time: It's long running time and just running unnecessary builds on Travis (which is free for the end users).

Travis there is a caching mechanism which could be used for incremental builds, or something like docker could be used to facilitate incremental builds and reduce build time. Even stuff in contrib should likely be verified, right?

On the TF integration/move, I think he just means that it wont include Keras' contrib folder in the TensorFlow repo?

I think he was saying that he has been working on a new TF only implementation of Keras that currently is internal at Google that will eventually be directly integrated into github.com/tensorflow/tensorflow.

ahundt on 11 Jan 2017

I envision that we would have a COLLABORATORS.txt document (or similar) in the Keras root that would describe a mapping between Github usernames (maybe email addresses) and directories that collaborators can commit to.

What does "directories that collaborators can commit to" mean? Does it mean that we have several owners for the keras_contrib repo who are respectful for different directories to merge the commit from other collaborators?

kemaswill on 11 Jan 2017

Here is a homebrew link I meant to post which has all the useful info for contributors. I thought it might make for a good example:
https://github.com/Homebrew/brew/tree/master/docs

ahundt on 11 Jan 2017

I just realized contrib is going to break saving and loading and a PR I just made for a bugfix is a possible solution. https://github.com/fchollet/keras/pull/5012 (PR probably still needs some additional docs and unit tests).

I added a global dictionary of custom objects. I updated get_from_module to first check that dictionary. This can be a standard location to register custom objects so they load by name and work with save and load.

The idea for contrib is that I could write a file l3_regularizer.py

class L3Regularizer(Regularizer):
  ...

get_custom_objects().update({"L3Regularizer": L3Regularizer})

If I then import keras_contrib.regularizers.l3_regularizer, Dense(...,W_regularizer='L3Regularizer') will work as expected and saving and loading will work as well. This is a potential standard for how to add custom objects to custom_objects and have everything save and load correctly.

Any thoughts on how contrib is going to affect saving and loading?

Cheers,
Ben

bstriner on 12 Jan 2017

Any movement on this?

patyork on 22 Jan 2017

This pull for custom objects just got merged yesterday: https://github.com/fchollet/keras/pull/5012

@fchollet suggested a scope operator which makes things much cleaner and helps prevent name collisions. Custom objects can be loaded within a with statement which prevents leaks.

We still need to enforce some standard for object names which would help reduce chances of collisions further, for example, using the module name in the name returned from get_config. Saving, loading, and instantiating by name will work as long as the string returned from get_config matches the string in custom_objects.

The trade-off is if we use fully qualified names, referencing the object by name gets a little wordy like Dense(512, ..., W_regularizer="l3_regularizer.regularizers.L3Regularizer") but you could still just create the instance yourself like Dense(512, ..., W_regularizer=L3Regularizer()).

#keras_contrib.regularizers.l3_regularizer.py
class L3Regularizer(Regularizer):
  ...
custom_objects = {"regularizers.l3_regularizer.L3Regularizer": L3Regularizer}

#using the module
from keras_contrib.regularizers import l3_regularizer
with custom_object_scope(l3_regularizer.custom_objects):
  # custom object will resolve within this scope by name
  layer = Dense(512, ..., W_regularizer="regularizers.l3_regularizer.L3Regularizer")

custom_objects is still a parameter to several functions and works as it did previously, but working through custom_object_scope prevents name leaking that was the existing functionality.

bstriner on 22 Jan 2017

@bstriner Just to be clear, even in the case of a built in contrib folder (in the Keras repo), the above notation would be required as all contributions would be considered custom objects?

patyork on 22 Jan 2017

Could we start a contrib repo as an external repo (and potentially include
it into Keras later down the road if that sounds like a good strategy?)

On 22 January 2017 at 12:08, Pat York notifications@github.com wrote:

@bstriner https://github.com/bstriner Just to be clear, even in the
case of a built in contrib folder (in the Keras repo), the above notation
would be required as all contributions would be considered custom objects?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/fchollet/keras/issues/4944#issuecomment-274355399,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AArWb28yDbmZH3djqV2pMWdoZ6qPDr4Fks5rU7dZgaJpZM4Lc-35
.

fchollet on 22 Jan 2017

I think that's a fine idea - I don't have the reservations against a (well managed) external repo that others have (such as versioning issues and whatnot).

patyork on 22 Jan 2017

@patyork such notation would be required for any objects you don't want to be part of the default namespace but want to add conditionally. This is also the notation you would use for custom objects in your own code if you want your models to save and load correctly.

If you want something to be included by default, you could add get_custom_objects.update(custom_objects) to the bottom of the module and make sure it is imported somewhere.

For objects in keras master, an alternative would be a further PR to get_from_module that can handle dotted objects. Then, contrib.L3Regularizer would resolve to keras.regularizers.contrib.L3Regularizer if run with module_params=keras.regularizers, unless overridden by custom_object_scope. I think this might be the cleanest final solution for handling a separate contrib repo, some included contrib repo, and custom objects in user code in a standardized way.

We could also add some custom handling to get_from_module to recognize relative and absolute paths. This would be my ideal logic for get_from_module:

get_from_module will first check custom_objects, which always takes precedence and can be used to override existing code
if the string is an identifier, get the identifier from the module
if the string is a dotted identifier, try to get the submodule from the module, possible importing modules if not already imported
if the string starts with /, try to import a fully qualified object

This way:

If the module is in contrib or some other module, return the fully qualified path in get_config and everything should work.
If the module is somewhere inside keras but not in one of the main modules, return the relative path in get_config.
If you want additional control or to enable some aliases, use custom_objects to provide clean, simple names for your objects.

Does that logic for get_from_module make sense? It would be back-compatible with existing code and would make everything nice and flexible. Wouldn't be too complicated to put together.

Cheers

bstriner on 22 Jan 2017

"Kextras"

farizrahman4u on 22 Jan 2017

Separate repo sounds much cleaner. Keras can have a very high coding standard and we can enforce a bunch of quality checks and documentation standards and test coverage while keras-contrib can have a little more variety. Separate repo would also make permissions a lot safer.

@patyork Where are you going to start working on the contrib repo? I'd love to take a look and try it out.

contrib might also benefit from a more complicated branch structure. master could be the current stable code. PRs would be made to dev so the community can try things out and do some cleanup.

There could be one tier of developers pushing changes from dev to master after they seem stable, and a broader tier working on dev. You could submit some useful code to dev, then someone else could clean up the documentation and testing before it goes to master.

bstriner on 22 Jan 2017

@bstriner I have no plans to start/own the contrib repo (if it is external); that'll be for someone else to start. AI/Keras is just a hobby, that I happen to have a bit of time for while I'm on a long vacation.

I started this thread was just to get a plan/repo in place - I've seen a lot of useful code just sitting in PRs and Issues without a good place to live, not to mention stuff in external repos. Plus, talk of starting a contribution repo had been floating around random PRs for a few months.

patyork on 23 Jan 2017

@bstriner I'd be happy to help with contrib. I use Keras ~daily for research and for production so I have aligned incentives, hah.

lukedeo on 23 Jan 2017

If it is a separate repo, I could start one, and add you guys as moderators. Let me know what you think.

farizrahman4u on 23 Jan 2017

👍1

@farizrahman4u sounds fine to me!

lukedeo on 23 Jan 2017

For an example of a reasonably active project with a main and contrib repository, see the static site generator, Pelican https://github.com/getpelican/pelican/ and its plugin repository https://github.com/getpelican/pelican-plugins/

I'm a heavy user of that project and a sometimes contributor. Qualitatively, I'm not sure the plugins repo has added much to the project; rather, many of the actively maintained pelican plugins seem to have moved into their own repositories, and many of the ones that remain have long-hanging pull requests which the maintainers need to intervene in. The contrib repository has become partially a index of interesting git submodules; but practically you often ignore the plugins repo and install a specific plugin you want via pip install. It looks like a lot of work for the maintainers to keep that shared playground and the benefit is not 100% clear.

OTOH, there are already well-maintained projects which can use keras _without_ a keras contrib repo, (e.g. http://edwardlib.org/ ), and it's easy for your project to depend on keras through the standard python way of declaring dependencies in setup.py.

This is not necessarily to say we should not have a contrib repo (or one with the same system as pelican-plugins) - but it would be nice to clarify precisely what type of contrib repo we were looking for, and the precise costs and benefits of the model we are aiming for.

For example, I am writing a custom keras layer at the moment. Should I be writing it in some contrib repo, in my own repo, or in both? What do I gain from having the option? What are the benefits of going in the official contrib repo? Free publicity?

danmackinlay on 23 Jan 2017

How should the "versioning hell" I described earlier be avoided in the separate repo?

Is there some way compatible commits could be synced between Keras and the separate contrib repo?

In other words, how do I make sure my contrib repo and keras repo are compatible, and if an update to one or the other causes a regression, how do I get back to the previous working state in an automated fashion?

Separate repos are fine if that's addressed, though I'm still more of a fan of the contrib folder idea which avoids that problem completely with some minor but manageable drawbacks as discussed earlier.

ahundt on 23 Jan 2017

Here are my thoughts :

The keras-contrib (is the name confirmed?) repo will mirror the Keras project structure (layers : core, recurrent,..; optimizers, constraints, regularizers etc..)

The repo will be moderated pretty much like its done in Keras.. users PR their work, will be reviewed by the community and then merged. Additionally, write access will be provided for users with major contributions, under the condition that they make changes to their contributions only.. There will be a markdown file with user to directories mapping specifying the write access of each user... as discussed above.

Developers should consider contributing to keras-contrib rather than own repo, because the contributions are peer reviewed and therefore have higher quality and credibility, and will be trusted more by end users.. Also higher visibility and reputation.

farizrahman4u on 23 Jan 2017

👍1

K = Keras, KC = Keras-Contrib (got lazy)

The simplest versioning-hell-abating tactic is simply to only support bleeding Keras (e.g. Travis tests using the master branch of Keras). If the contrib repo tests fail against that branch, then the collaborator (and contributor) base should look into and rectify it. If at any time you wish to upgrade the contrib repo/keras, you should pull from git both the KC repo and the K repo.

If you want to support simple versioning (e.g. Keras v1.2.0 is supported, Keras 1.2.1 is not) then it is probably best handled as Keras versioning is; separate Github tags and PyPy releases done at the same time as Keras. So KC v1.2.0 should sit on K v1.2.0. What falls out of that is the same as above - the master branch of KC supports the master branch of K. If you only want to walk the releases, you install/upgrade to matching K and KC versions, as defined by the PyPY releases or the github tags.

patyork on 23 Jan 2017

👍3

Agree. Support only for bleeding edge.

farizrahman4u on 23 Jan 2017

+1 for only bleeding edge support

the-moliver on 23 Jan 2017

To clarify: KC.master <--supports--> K.master.

What falls out of that, then, is that when @fchollet preps a release, the KC team checks the KC Travis tests (against the current K.master, which is becoming a new release):

If the build is passing:
- Tag as KC.<-new version-->
- Push KC release
- masters and equivalently versioned releases support each other
If the build is not passing:
- Pause any additional merging
- Rectify build issues
- Tag
- release
- masters and equivalently versioned releases support each other

Therefore, it would also be true that KC.1.2.1 <--supports--> K.1.2.1 as they are versioned in tags/branches/PyPy.

patyork on 23 Jan 2017

@ahundt It seems there would be 2 possible ways.

If I understand correctly, what @fchollet, @bstringer, and @patyork were discussing at the start is a KC which by disciplined community effort moves in lockstep with the development of the K main source tree, so the 'extras' in a given point release would be compatible with keras version at the same point release. (Edit: as I type this, @patyork has written more or less this) So, something like the tensorflow repository itself as regards compatibility, but with KC broken out into a second repository in the hope of fostering a good mutual code review community.
The KC model I was implicitly assuming, on reflection, seem to be different - A bunch of different packages, each with their own setup.py to specify their own version dependencies that happen to be in the same repo.

Number 2 (packages with explicit version requirements) is the standard python way to solve versioning problems, although it seems to defeat @farizrahman4u's suggestion that the purpose of a shared repo is exposure and (hopefully) peer review, since it essentially means having separate packages with loose dependencies.

On the other hand, number 2 feels dangerous to me; if someone drops off the maintenance bandwagon just before the release date for K 1.3.0, for their subtle and sophisticated KC sub package, I presume we either have to wait for someone else to step in to fix it, or just delete it so that KC can remain current. Or am i missing the point? Are we weeding out one-person-projects from KC to avoid this kind of risk?

danmackinlay on 23 Jan 2017

One person projects are welcome. The assumption is that all contributions are properly documented, and the community will be able to pick up where the original contributor left off.

farizrahman4u on 23 Jan 2017

I do think Keras contrib master should be in sync with Keras master.

Therefore, it would also be true that KCv1.2.1 <--supports--> Kv1.2.1 as
they are versioned in tags/branches/PyPy.

Sounds good to me.

We will change the Keras API in the relatively near future with Keras 2
(changes will be made backwards compatible as much as possible, maybe
completely). Some updating of Keras contrib will be required when that
happens, but beyond that point I would expect the Keras 2 API to be
completely stable in the very long term. Keras 2 should essentially be
"Keras forever".

On 23 January 2017 at 12:49, Fariz Rahman notifications@github.com wrote:

One person projects are welcome. The assumption is that all contributions
are properly documented, and the community will be able to pick up where
the original contributor left off.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/fchollet/keras/issues/4944#issuecomment-274612451,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AArWb7v6IE0yoD1VbE-F2nyAq_RluPAuks5rVRJlgaJpZM4Lc-35
.

fchollet on 23 Jan 2017

👍1

The thing with permanent bleeding edge is it inevitably breaks and you as a user need to reconcile code you've not modified before, and there's no trivial way to roll back to the last thing that worked, because the broken bleeding edge will still be used even if you revert.

Actually, if the build/test artifacts/log of KC could simply store/save/print the git hash of K it tested with, that would at least give something to key off of if you needed to go back in the future.

Anyway, don't let me block what is a very good thing, tweaks can always be made later once things are no longer hypothetical. :-)

ahundt on 24 Jan 2017

@howthebodyworks
I think the reason for a separate contrib repo is that the research of Deep Learning goes too fast, there will be new architectures, regularization methods, activation functions everyday. It's best to set a separate official contrib repo so that 1). The keras users can have import and use those new methods at their will through an official repo, while at the same time 2). we keep the original keras repo intact and clean, we will only merge some of the new methods into the original keras repo after some kind of discussion.

kemaswill on 24 Jan 2017

So here are the specs for Keras contrib:

Repo name should be something like keras-contrib or keras-extras, nothing extravagant
Maintainers should be @patyork, @lukedeo, @the-moliver, @tboquet, @kemaswill, @bstriner, @farizrahman4u, @fchollet to start with (that's already a lot)
To become maintainer, you should be a recognized previous contributor to either contrib or core Keras
directory structure should mirror core Keras.
To make integration of specific objects into core easier, style guidelines should be the same as with Keras core (PEP8, docstring format) and that should be enforced via a Travis test.

Feel free to get started. Note that we would eventually move the repo to a Keras GitHub organization.

fchollet on 24 Jan 2017

👍5

www.github.com/farizrahman4u/keras-contrib

farizrahman4u on 25 Jan 2017

👍3

Question from https://github.com/farizrahman4u/keras-contrib/issues/15: Should a tf-slim style setup for "batteries included" vision based classification and semantic segmentation go in keras-contrib or deep-learning-models?

If the latter, can deep-learning-models have maintainers and a more substantial structure as well?

What about an appropriate place for reinforcement learning algorithms like a3c, ddpg, cem and so on?

ahundt on 8 Feb 2017

Please take me off this email list. I am not even the right person.

On Feb 7, 2017 6:08 PM, "Andrew Hundt" notifications@github.com wrote:

Question from #4944 https://github.com/fchollet/keras/issues/4944:
Should a tf-slim https://github.com/tensorflow/models/tree/master/slim
style setup for "batteries included" vision based classification and
semantic segmentation go in keras-contrib
http://www.github.com/farizrahman4u/keras-contrib or
deep-learning-models https://github.com/fchollet/deep-learning-models?

If the latter, can deep-learning models have maintainers and a more
substantial structure as well?

What about an appropriate place for reinforcement learning algorithms like
a3c, ddpg, cem and so on?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/fchollet/keras/issues/4944#issuecomment-278207649,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ATd6Lo9gFLEHAtUskfqPHRX1ElIkO1Jqks5raSOagaJpZM4Lc-35
.

bstringer on 8 Feb 2017

@bstringer your email provided an unsubscribe link https://github.com/notifications/unsubscribe-auth/ATd6Lo9gFLEHAtUskfqPHRX1ElIkO1Jqks5raSOagaJpZM4Lc-35 (this message will too)

ahundt on 8 Feb 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.