It's about time for a feature complete stable release.
We are in the process of a major refactor. While most changes are in backend side and therefore should not significantly affect users, we do expect to break a few little things and maybe compatibility with other language bindings.
So authors of Julia, R, Scala, etc, package please stay tuned and adopt the new API. It should be a quick fix and we will have guide for the transition.
@thirdwing @pluskid @vchuravy @Ldpe2G
mshadow::TBlob
and mshadow::TShape
to TBlob
and TShape
in your code.Storage::Get()->Alloc(size, Context::GPU())
to allocate memory to current GPU instead.If you were training networks with BatchNormalization layer on CPU or on GPU with cudnn v4 or below before Jul 5th, you may find your model outputting totally wrong results after loading it back for testing. The simplest fix is to load your .param files with ndarray.load and set all arrays with key ending with '_gamma' to 1.0 and save them back.
I would propose Float16 support as an additional target.
For optimization part, @tqchen and I are thinking about supporting throw optimizer into computation graph, so less ccxx will be needed.
Until we have RTC that doesn't help much. You still need at least 2x buffer.
We may consider to building document on EC2, then sync back to readdoc because doc build fail for time out in compile.
yes. or maybe just host from ec2
great!!
@piiswrong what does nnvm mean?
@vchuravy we may need to put more effort on int8 rather than fp16. From current info, int8 will be mainstream in future.
@antinucleon Great to hear, the work @Godricly and I have been working focused purely on making our operators support arbitrary DTypes. That should help the Int8 work as well?
(this is of topic but I would expect FixedPoint with Int8 instead of truly Int8?)
@vchuravy It is still investigated by @winstywang If use int8 directly, there is no performance gain. But official document mentions for new TitanX, the int8 performance is 44T, almost 4 times than fp32.
@vchuravy NV should have specific instructions for int8, currently using int8 directly only brings 25% performance gain according to our test.
My suggestion as follows:
it would be nice if we can have gui for this, its painful to debug the graph
stochastic depth can be done with bucketing.
we have monitor for debugging.
with NNVM we may enable fully dynamic execution.
@piiswrong @leopd We need to move doc building system to EC2. Readthedoc system is keeping failure because building out of time.
@antinucleon Is there any paper available right now for uint8 NN? And what is NNVM stands for? I'm having a hard time searching for it.
Here are some thoughts about the docs:
@piiswrong @antinucleon
Another thing I'd like to ask for is a refactor of LSTM; if it is possible.
Can we hide those provide_data and provide_label in an elegant way? I understand that currently approach works pretty well. But exposing the internal stuff may bring some troubles (like extra provided_data_type for me in fp16 lstm #2564).
I would vote for another issue which is very important for user:
Resnet is caused by IO. Min has reproduced exact result by using Torch IO.
The problem is who will do that.
On Mon, Aug 8, 2016 at 02:53 Naiyan Wang [email protected] wrote:
I would vote for another issue which is very important for user:
-
Make sure the speed and accuracy in all test cases same or better than
Caffe.
-Currently, we have CPU slower than Caffe, small batch slower than
Caffe, and Resnet on Imagenet worse than Caffe all kinds of issues related
to performance.—
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
https://github.com/dmlc/mxnet/issues/2944#issuecomment-238191546, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABM13o2eTqQRmXl6jgQ4zWF3EmE8-BL9ks5qdvyjgaJpZM4JeH5k
.
Sent from mobile phone
I hope that for each of the issue raised, people can show up and assign, or self assign each of the issue, so we are moving forward effectively.
it's good to have a single page containing all things. but total agree that we can open issue for each point and cite the links here.
@mli Yes. If someone wants to talk more about/start working on a task, feel free to open a new issue and link it here. Also assign it to milestone v1.0
Also we may consider to treat warning as error in the future.
I'll list a roadmap for scala pkg this weekend.
@antinucleon Can I know what's wrong with IO that causes the performance drop?
For docs, I think the query of our github issues with keyword "how to" is a good source for getting a list of topics to potentially cover.
@piiswrong What does NNVM stands for?
@windywinter about NNVM: https://github.com/dmlc/MXNet.jl/pull/115
@antinucleon, @jennyzhang0215 and I have implemented MemN2N and NTM and replicated the results in the paper, we may release the code after AAAI or WWW. I can send you the code if you need now.
Is ok to do some code optimization in NNVM? https://github.com/dmlc/mxnet/issues/3105
Thanks all DMLC for this great effort
Most helpful comment